TOPIC 6 : DATA DESCRIPTION 6.1 – Introduction to Data 6.2 – Measures of Location 6.3 - Measures of Dispersion 1 6.1 – Introduction to Data Learning outcomes: At the end of this topic, students should be able to: (a) identify the discrete and continuous data (b) identify ungrouped and grouped data (c) construct and interpret stem-and-leaf diagrams 2 6.1 – Introduction to Data Statistics involves collecting, organizing, presenting and analyzing data in order to obtain useful information for decision making. POPULATION is a collection of all elements whose characteristics are being studied. SAMPLE is a group of elements drawn from a population which is representative of that group. A sample is a subset of a population 3 Parameter is a numerical measurement describing some characteristics of a population e.g ; mean, median, mode... Variable is a characteristic or attribute that can take different values. e.g ; height of student Variables can be classified into Discrete Data and Continuous Data 4 DATA A collection of observations or measurements or information obtained from study that is carried out. Data Quantitative Data data that can be measured numerically Discrete data Continuous data Qualitative Data data that cannot assume a numerical value but can be classified into categories 6 Discrete data Discrete data are data that assume integer values. E.g; The number of teachers in a school Continuous data Continuous data are data that assume any numerical values in a certain interval on the real line. E.g; The height of students in KMM, 130.2 cm, 132.5 cm, 131.8 cm…… 7 Raw data can be represented in ungrouped data and grouped data. Data (a) UNGROUPED DATA data which listed as a sequence or in the form of a frequency table but without the use of intervals (b) GROUPED DATA data which are categorized into class intervals 8 Example The following are the length of 12 leaves collected from a garden measured to the nearest cm. 10 11 14 12 13 9 9 11 10 11 13 12 these data are called raw data. 9 The data can be summarized as a FREQUENCY DISTRIBUTION TABLE. Length Of Leaf (cm) Frequency 9 10 11 12 13 14 2 3 2 2 2 1 The data shown in this frequency distribution above is known as ungrouped data The frequency distribution below shows the same data but grouped into the following intervals. Intervals 9 to 10 Length Of Leaf (cm) 9-10 11-12 13-14 Frequency 4 5 3 Data in the form of the frequency distribution table shown above is known as grouped data. 11 Stem and leaf diagram • Stem and leaf diagram is another technique of illustrating the quantitative data. • Each value is divided into two parts, which are the stem and the leaf. • The digit(s) in the greatest place value(s) of the data values are the stems. • The digits in the next greatest place values are the leaves. • For example, if all the data are two-digit numbers, the number in the tens place would be used for the stem. The number in the ones place would be used for the leaf. Example 1 Construct a stem-and-leaf diagram for the data below: 12, 13, 21, 27, 33, 34, 35, 37, 40, 41 Steps for constructing stem and leaf diagram. Step 1 • Separate each value into two parts, i.e. the stem and the leaf • Since given value consisting of two digits, therefore first digit can be used as the stems. The leaves consists of the second digit. (when the values are big, the stem can consist of several digits) Step 2 • Draw a vertical line and list the stem on the left following the magnitude starting from the smallest number. Step 3 • List the leaf, i.e. The corresponding second digit on the right of the vertical line. Solution Stem 1 2 3 4 2 1 3 0 Leaf 3 7 4 5 1 7 Example 2 Construct a stem-and-leaf diagram for the data of a test scores for a group of students: 92, 92, 96, 98, 83, 85, 72, 74, 76, 78, 78 79, 61, 64, 64, 67, 68, 50, 50, 52, 58, 58 Solution Test scores out of 100 Stem Leaf 9 2 2 6 8 8 3 5 7 2 4 6 8 8 9 6 1 4 4 7 8 5 0 0 2 8 8 Based on the stem and leaf diagram: • 4 students got a mark in the 90's on their test out of 100. • 2 students received the same mark of 92. • No marks were received below 50. • No mark of 100 was received. When you count the total amount of leaves, you know how many students took the test. Exercise Try your own Stem and Leaf diagram with the following temperatures for June 77 80 82 68 65 59 61 57 50 62 61 70 69 64 67 70 62 65 65 73 76 87 80 82 83 79 79 71 80 77 Solution Temperatures Stem Leaf 5 079 6 11224555789 12/10/11 7 001367799 8 0002237 6.2 Measures Of Location Learning outcomes: At the end of this topic, students should able to: (a) Find and interpret the mean, mode and median for ungrouped data. (b) Find and interpret the mean, mode, median, quartiles and percentiles for grouped data. (c) Construct and interpret box-and-whisker plots. 23 Data UNGROUPED DATA data which listed as a sequence or in the form of a frequency table but without the use of intervals mean, mode median GROUPED DATA data which are categorized into class intervals mean, mode median, quartiles and percentiles 24 Ungrouped Data Mean • The sum of the values of all observations divided by the total number of observations. • Using the symbol x x1 + x2 + x3 +… + xn Mean, x = n x = n 25 Example 1 a) Find the mean of a set of numbers 3, 5, 7, 4, 5, 9, 6 b) Find the mean of a set of data Number of Male Children Frequency 0 2 1 5 2 7 3 3 4 2 5 1 26 Solution 1(a) 27 Solution 1(b) 28 Median • The middle value when a set of data is arranged in order of magnitude (in ascending or descending). • For a set of data x1, x2, x3,..., xn arranged in order of magnitude, there are two cases. 29 CASE 1: data (n) is odd Median = n 1 2 th CASE 2: data (n) is even Median = Mean of the two middle values 30 Example 2 Find the median for the following set of data. a) 180 186 191 201 209 b) 24 21 28 36 2.71 5.48 17 c) 3.56 8.61 4.35 219 220 32 20 6.22 31 Solution 2(a) 32 Solution 2(b) 33 Solution 2(c) 34 Mode • The mode of a set of data is the value that occurs most frequently. Example 3 Find the mode for the following set of data. a) 5, 2, 3, 3, 5, 4, 28, 5 b) 2, 3, 5, 8, 10 c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5 35 Example 4 Find the mode for the following data: x 20 33 40 52 f 4 10 6 7 The higher frequency Solution 36 Grouped Data Mean • For grouped data, the mean is given by fx x= f • f = the frequency for each class-value • x = class mark Class mark = the mid-point for each class-value 37 Example 1 The table shows the distribution of the fat content of 40 pieces of food. Find the mean for the following distribution. Fat content Frequency 0.1 - 1.0 4 1.1 - 2.0 5 2.1 - 3.0 7 3.1 - 4.0 13 4.1 - 5.0 11 38 Solution 1 Fat content 0.1 - 1.0 1.1 - 2.0 2.1 - 3.0 3.1 - 4.0 4.1 - 5.0 Frequency, f 4 5 7 13 11 fx mean, x f Class mark, x fx 39 Median • Median of frequency distribution for grouped data, can be estimated by using the formula n 2 -Fk-1 Median = Lk + C fk Lk = lower boundary of class median n = number of data or the sum of frequency Fk-1 = cumulative frequency before the median class fk = frequency of the median class C = class width 40 Example 2 Find the median given that the lengths of a sample of 90 pieces of leaves from a tree are recorded in the table (Figure 1): Lengths (cm) 4–5 6–7 8–9 10 – 11 12 – 13 14 – 15 Frequency 2 6 14 31 30 7 Figure 1 41 Solution 2 42 Solution 2 Length (cm) frequency 4–5 6–7 8–9 10 – 11 12 – 13 14 - 15 2 6 14 31 30 7 Cumulative frequency 43 Solution 2 44 Mode d1 Mode = LB + C d1 + d2 LB = lower class boundary of mode class d1 = the different between the mode class frequency and the PREVIOUS class frequencies d2 = the different between mode class frequency and the class frequency AFTER the mode class frequency. C = class width 45 Example 3 The table below shows the distribution of the heights of 30 plants of type B which have been planted for 6 weeks. These heights are measured to the nearest cm. Estimate the mode of this distribution. Heights 3 – 5 6 – 8 9 – 11 12 – 14 15 – 17 18 – 20 (cm) f 1 2 11 Mode class 10 5 1 46 Solution 3 d1 Mode LB C d1 d 2 47 Solution 3 48 Quartiles k 4 n -Fk-1 Qk = Lk + fk Ck k =1,2, 3 Lk = lower class boundary of the class containing the quartile Fk-1 = cumulative frequency before the class containing the quartile n = the number of data fk = frequency of the class containing the quartile Ck = class with of the class containing the quartile 49 Example 4 The table shows the marks of 250 pre-university students in an examination. Marks 0-9 10-19 No. of students 15 20 20-29 30-39 40-49 50-59 60-69 70-79 25 24 12 31 71 52 Estimate the: a) First quartile b) Third quartile 50 Solution 4(a) 51 Solution 4(a) 52 Solution 4(b) 53 Solution 4(b) 54 Percentiles Percentiles divide the data set into 100 equal parts. The percentile can be obtained by the formula below: k 100 n -Fk-1 Ck Pk = Lk + fk k =1, 2, 3, ..., 99 55 Where, Lk = lower class boundary of the class containing the percentile Fk-1 = cumulative frequency BEFORE the class containing the percentile n = the number of data fk = frequency of the class containing the percentile Ck = class with of the class containing the percentile 56 NOTES!! The 25 percentile is called the 1st quartile, Q1 P25 = Q1 The median is the 50 percentile, are also called the second quartile, Q2 Median = P50 = Q2 The 75 percentile is called the 3rd quartile, Q3 P75 = Q3 57 Example 5 The following table shows the weekly pocket money of 50 students in a secondary school. Pocket money (RM) 20< x <25 25< x <30 f 10 15 30< x <35 35< x <40 40< x <45 16 5 4 Find the 40th and 90th percentiles respectively. 58 Solution 5 Pocket money f 20< x <25 25< x <30 30< x <35 35< x <40 10 15 16 5 40< x <45 4 Cumulative frequency 59 Solution 5 60 Box and Whisker Plots A box plot summarizes data using the median, quartiles, and the extreme (least and greatest) values. It used to provide a graphical display of the center and variation of a data set. Construction of Box and Whisker Plots Step 1 : Arrange the data in order least to greatest Step 2 : Find median, quartiles, and the extreme (least and Step 3 : greatest) values Connect the quartiles to each other to make a box, and then connect the box to the minimum and maximum with lines. 62 Example 1 Draw a Box-and-Whisker Plots for the following set of data. 3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7 63 Solution 1 Step 1: Arrangement of data Arrange your numbers from the least to the greatest: 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 64 Step 2: Find median, quartile 1 and quartile 3 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • Then find the median (from the ordered list): • Cross off one number from each side until you reach the middle number (or numbers). 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • If there are two numbers in the middle, Add those 2 middle numbers together: 6 + 7 = 13 • Then divide by 2: 13 ÷ 2 = 6.5 • The median is 6.5. 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • Then split the numbers on left and right sides of the median: 1, 2, 3, 4, 5, 6, 6, │7, 8, 9, 10, 11, 13, 14 1, 2, 3, 4, 5, 6, 6, │7, 8, 9, 10, 11, 13, 14 • Find the median for each half: 1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14 1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14 Left Right Median = 4 Median = 10 1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14 Left Right Median = 4 Median = 10 • The left median is called the LOWER QUARTILE, Q1. • The right median is called the UPPER QUARTILE, Q3. Step 3 : Connect the quartiles to each other to make a box, and then connect the box to the minimum and maximum with lines. 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • Draw a number line from the smallest to the largest number without skipping any numbers. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14 • Put circles at the LOWER and UPPER Quartiles. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 • Draw a box connecting the circles at the LOWER and UPPER Quartiles. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 • Put a circle at the median (6.5). • Draw a line connecting the median to the box. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 • Put circles at the high and low points. • Draw lines that connect the high and low points to the box. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Box and Whisker Plot 3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Here is the completed Box and Whisker Plot! Symmetry and Skewness 1. Symmetrical distribution: The ‘whiskers’ are the same length and the median is the centre of the box. Q1 Q2 Q3 76 2. Positively skewed distribution: The left ‘whiskers’ is shorter than the right ‘whiskers’ and the median is nearer Q1. Q1 Q2 Q3 77 3. Negatively skewed distribution: The left ‘whiskers’ is longer than the right ‘whiskers’ and the median is nearer Q3. Q1 Q2 Q3 78 6.3 – Measures of Dispersion Learning outcomes: At the end of this topic, students should be able to : a) Find and interpret variance and standard deviation for ungrouped data. b) Find and interpret variance, standard deviation for grouped data. c) Find and interpret the Pearson’s Coefficient of Skewness. 7 9 79 Data UNGROUPED DATA variance and standard deviation GROUPED DATA variance and standard deviation 80 Variance and standard deviation for Ungrouped data. For ungrouped data ; x Mean = n ( x) x 2 n Variance, s = n -1 2 2 Standard deviation, s = s 2 81 Example 1 Find the mean, variance and standard deviation for the data below 2, 7, 10, 9, 2, 5, 16 Solution 1 82 Solution 1 83 Exercise Find the mean and standard deviation of the set of numbers 5, 2, 3, 8, 6 Answer: Mean = 4.8 Standard deviation = 2.39 Example 2 A set of numbers {1,6,3,2,8,5, x, y} has mean of 4, 36 variance of .Show that x + y = 7 and hence find 7 the values of x and y. Solution 2 85 Solution 2 86 Variance and standard deviation for Grouped data. ( fx) fx 2 n Variance, s n -1 2 2 with x = class midpoint f = frequency Standard deviation, s = s 2 87 Example 3 Find the mean, variance and standard deviation for the data below. Marks 0 x 20 f 9 20 x 40 29 40 x 60 42 60 x 80 26 80 x 100 14 88 Solution 3 Marks x < 20 20 x < 40 40 x < 60 60 x < 80 80 x < 100 0 f Midpoint, x fx fx 2 9 29 42 26 14 n =120 89 Solution 3 90 Pearson Coefficient of Skewness The Pearson coefficient of skewness provides a numerical measure of the skewness of a distribution. Denoted by SK , it is calculated as follows : Sk = 3(mean - median) standard deviation OR Sk = (mean - mode) standard deviation 91 Sk = 3(mean - median) standard deviation = (mean - mode) standard deviation 3(mean -median) = mean -mode Note : If Sk = +ve ; the distribution is positively skewed. If Sk = -ve ; the distribution is negatively skewed. Example 4 Find the Pearson's coefficient of skewness. 1.2, 1.5, 1.9, 2.4, 2.4, 2.5, 2.6, 3.0, 3.5, 3.8 93 Solution 4 94 Solution 4 95 Example 5 In exam, the marks of 120 students is given as below fx =3108 fx 2 = 82398 Mode = 27.6 Find the mean, standard deviation and Pearson's coefficient of skewness for the distribution and interpret the result. 96 Solution 5 Solution 5 98 Exercise The marks for 400 KMM students in the first quiz are given below Marks Number of students 0-9 44 10-19 56 20-29 64 30-39 78 40-49 60 50-59 40 60-69 36 70-79 18 80-89 4 Estimate the mean, median and standard deviation for the above sample. By calculating Person’s coefficient of skewness, state the type of distribution for the above data. Answers: mean = 35.3 median = 34.1 standard deviation = 20.1 Person’s coefficient of skewness = 0.179 (skewed to the right)