F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-1 10 Basic statistical measure 基本統計學的量度 §10.0.1 Basic Definition of statistical measure (1) Population Parameters and Sample Statistics Population (2) Simple Random Sampling (3) Population Size -------------------------------------------------------------------------------------------------------(1) Population Parameters and Sample Statistics Population A population is a collection of all possible observations that are of interest in a particular study. *Population Parameters For an example, mean, variance, deviation, mode and median are population parameters. Sample The part of observation that are actually collected is called a sample of the population. *Sample statistics For an example, sample mean, sample variance and sample median are sample statistics. Statistical Inference Estimating a population parameter by using the corresponding sample statistics is one aspect of statistical inference. Sampling units or observational units “population” refers to the set of data (observations, measurements, etc). For an example, “population” may refer to the set of data that represents the weight of the observed objects. Sampling units or observational units are units on which observations are made. (2) (3) Page 1 Simple Random Sampling “RA N#” key construct a random number table like that 0.871 0.843 0.874 0.237 0.451 0.583 0.201 0.199 0.565 0.298 0.532 0.932 0.508 0.710 0.900 0.561 0.119 0.206 0.364 0.814 0.770 0.830 0.661 0.366 0.962 0.727 0.481 0.964 *Population Size(n 10) [Notice: Size of a population need not be large. ] 0.980 0.690 0.484 0.703 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure §10.0.2 F7-MS-Ch10and 11-2 Discrete and Continuous Variables (i) Discrete Variable (ii) Continuous Variable (i) Discrete Variable There are the marks obtained by 30 pupils in a test: 6 3 5 9 0 1 8 5 6 7 4 4 3 1 0 2 2 7 10 9 7 5 4 6 6 2 1 0 8 8 Discrete data, for example, the number of cars passing a checkpoint in a certain time, the shoe sizes of children in a class, the number of tomatoes on each of the plants in a greenhouse. (ii) Continuous Variable These are the heights of 20 children in a school. The heights have been measured correct to the nearest cm. 133 131 130 134 136 127 131 135 120 141 125 137 138 127 144 133 133 143 128 129 For example 144 cm ( correct to the nearest cm) could have arisen from any value in the range 143.5cm h < 144.5 cm. Other examples of continuous data are the speed of vehicles passing a particular point, the masses of cooking apples from a tree, the time taken by each of a class of children to perform a task. **Continuous data cannot assume exact value, but can be given only within a certain range or measured to a certain degree of accuracy,** Frequency polygons and curves are particularly useful for showing the general shapes of frequency distributions. The above figures shows some examples of continuous distributions with different typical shapes. Page 2 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-3 §10.1 Grouped and ungrouped data 分組與不分組數據 Grouped data 分組數據 Data with different values are classified under the same heading (不同數值數據擺放 在同一標題). The reason for grouped data is that data are with great variety (太多元 化) or there are too many data (太多數據) to deal with. However, when data are grouped, the original information before grouping is lost (失去分組前原來資料). Therefore, we have to estimate (估計) for example the mean (平均值) and standard deviation (標準差) after grouping. Hence, error (誤差) may result, and this is called loss of information. Example 1 The table below shows the distribution of presence (出席人數分佈) of 7A of a certain school in a certain month: No. of presence(x) No. of days(f) 19 1 20 2 21 3 22 3 23 6 24 5 25 3 26 2 We may change it to grouped data by coming the consecutive values (組合相鄰數值) as following: No. of presence(x) Page 3 No. of days(f) 19 – 20 3 21 – 22 6 23 – 24 11 25 – 26 5 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-4 In ungrouped data (不分組數據), then mean number of presence per day (平均每天 出席人數) 19 2(20) 3(21) 3(22) 6(23) 5(24) 3(25) 2(26) 25 = In grouped data (分組數據), the mean number of presence 3(19.5) 6(21.5) 11(23.5) 5(25.5) 25 = Hence, the mean calculated from ungrouped data and grouped data are not necessarily the same. §10.2 Measure of central tendency 集中趨勢的量度 The measure of central tendency is a value which may represent the whole set of data (代表整體數據) and has the tendency to lie centrally within the whole set of data, when arrange in order of magnitude (並有趨向在數據中心位置, 當依著數據排列). Common types of measure of central tendency 常見集中趨勢量度 §10.2.1 (a) Arithmetic Mean 平均值 the mean of all data (b) Median 中立數 the middle value (中間位置數值) of the data when arrange in order of magnitude (依 著數據大小排列), i.e. the (n+1)th data in order of magnitude, where n is the total frequency (總頻數). (c) Mode 眾數 the data with the greatest occurrence (出現次數最多). Page 4 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-5 Example 6 For the data: 3, 4, 6, 7, 7, 10, 13, 14, 16, 17, 20, find the interquartile range. n = 11 1 (11+1)th = 3rd 4 2 position of Q2 = (11+1)th = 6th 4 3 position of Q 3= (11+1)th = 9th 4 The interquartile range IQR = Q3 – Q1 = 16 – 6 =10 Range = 20 – 3 =17 position of Q1 = Q1= 6 Q2= 10 Q 3= 16 Example 7 The number of absence (缺席人數) of a certain class for 12 consecutive days is as follows: 0, 3, 4, 7, 1, 9, 2, 11, 1, 2, 5, 0. Determine the interquartile range. 0, 0, 1,|1, 2, 2,| 3, 4, 5,| 7, 9, 11 n = 12 1 1 position of Q1 = (12+1)th = 3 th 4 4 2 1 position of Q2 = (12+1)th = 6 th 4 2 3 3 position of Q3 = (12+1)th = 9 th 4 4 Q1 = 1 Q2 = 2.5 Q3= 5 + 75 3 = 5 + 1.5 = 6.5 4 ∴ The interquartile range = 6.5 – 1 = 5.5 (c) Percentile 百分位數 i (n 1) th item in ascending order of magnitude, 100 …… where i = 1, 2, 3 99 and n is the total frequency. the ith percentile of a distribution = Example 8 The distribution of the number of apples in 199 boxes is as follows: Number of apples 100 101 102 103 104 105 Find from the table, Page 5 Number of box 16 24 47 76 30 6 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure (c) F7-MS-Ch10and 11-6 (a) the interquartile range (b) the median (c) the 20th percentile (d) the 84th percentile Standard deviation 標準差 Given n data x1, x2, x3 …… xn, ( x1 x ) 2 ( x 2 x ) 2 ( x3 x ) 2 ...... ( x n x ) 2 n then standard deviation x1 x2 x3 ...... xn x2 n 2 where x 2 2 2 x1 x2 x3 ...... xn n Remark varinace 方差 Page 6 ( x1 x ) 2 ( x2 x ) 2 ( x3 x ) 2 ...... ( x n x ) 2 n 2 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-7 Example 9 Find the standard deviation for the two set of data: (a) Data A: 2, 4, 5, 6, 8. x 5 (2 5) 2 (4 5) 2 (5 5) 2 (6 5) 2 (8 5) 2 5 =2 (b) Data B: 3, 5, 5, 8, 9. x 6 (3 6) 2 (5 6) 2 (5 6) 2 (8 6) 2 (9 6) 2 5 =2.19 Which set of data is more concentrate using the criteria (準則) of standard deviation? Data A is concentrate the criteria of standard deviation. §10.3 Determination of measure of dispersion using grouped data 分組數據求離差的量度 (a) Quartile and interquartile range the ith quartile is the value below which 25I% of total data lies, where i = 1, 2, 3. (b) Percentile the ith percentile is the value below which i% of total data lies, where i = 1, 2, 3……99. Page 7 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure Example 10 The distribution of lengths of 100 rods are as followings: Length of rods (cm) 10 – 14 15 – 19 20 – 24 Number of rods 5 15 50 F7-MS-Ch10and 11-8 25 – 29 20 30 – 34 10 (a) Construct a cumulative frequency polygon (作一累積頻數多邊形) for the data, and estimate from the graph, (b) the median and interquartile range of length of rods. (c) the range of length (長度範圍) within which the central 20% of data lies. (d) estimate the mean length of the rods. Page 8 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-9 Example 11 The cumulative distribution (累積頻數分佈) of weight of a number of pigs are as follows: Weight less than (kg) 110 120 130 140 150 160 170 Number of pigs 0 18 68 140 180 192 200 (a) Construct a frequency distribution for the data and hence estimate the mean and standard deviation of weights of pigs. (b) Draw a cumulative frequency polygon for the distribution of weights and hence estimate. (i) the interquartile range and median weight of pigs. (ii) the modal class 眾數組 (iii) the range of weights within which the central 30% of pigs lie. Page 9 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-10 Example 12 The following table shows the distribution of time a particular group of students need to solve a given puzzle (填字遊戲), correct to the nearest seconds (以最接近秒為準): Time (sec) Number of students 10 – 14 5 15 – 19 8 20 – 24 12 25 – 29 38 (a) Complete the table. (b) Draw a cumulative frequency polygon for the distribution. Page 10 30 – 34 10 35 – 39 7 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-11 From your graph, estimate (c) the median and the interquartile range of time. (d) the number of students with time between 22 and 28 seconds. (e) the mean and standard deviation of time without using the graph. Example 13 The figure shows the cumulative frequency polygon of weight in kg for a group of 100 students. Page 11 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-12 (a) Use the graph paper provided to draw a histogram (矩形圖) of the weights. (b) Determine the interquartile range of the weight from the cumulative frequency polygon. (c) Determine the mean weight from the histogram. Page 12 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-13 §10.4 Graphical representation of data 數據的圖像表示法 (a) Stem and leaf plot 幹葉圖 A graphical method of presenting data in a sorting manner (分類) by listing (列出) the data in order form (依大小次序). The leading digits are called stem (幹) and the trailing digits are called leaf (葉). Example 14 The number of hours spent by 25 students in studying in mathematics test is shown as follows: 11 9 25 21 18 25 9 32 29 19 19 19 22 12 6 30 19 15 19 42 25 10 19 25 12 (a) Copy and complete the following stem-and-leaf diagram for the above data: Stem (in 10) Leaf (in 1) 0 6 9 9 1 0 1 2 2 3 4 5 (b) Page 13 2 5 8 9 9 9 9 9 9 Find the mode, the median and the interquartile range of the numbers of hours spent by the 25 students in the mathematics test. F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-14 Example 15 A stem-and-leaf diagram for the test scores in mathematics of 30 students is shown below: Stem (tens) 1 2 3 4 5 6 7 8 9 (a) Leaf (digits) 0 0 3 0 1 2 4 8 2 4 2 3 2 6 5 2 3 2 8 6 5 3 9 7 7 4 8 9 4 8 9 Find the mean, the median and interquartile range of these scores. mean = 59.7667 (to 4d.p.) 30 1 position of median = = 15th 2 th th (30 1) 3 7 position of Q1 4 4 th th (30 1)3 1 23 position of Q3 4 4 Q1= (59+61) 2 = 60 Q2 50 50 49 1 49.75 4 Q3 = 72 Interquartile range = 72 – 49.75 = 22.25 (b) If the score 73 is an incorrect record and the correct score is 43, which of the statistics will have different values? Find the correct values of these statistics. mean = 58.7667 (to 4 d.p.) median = (58+59) 2 = 58.5 49 48 49 48 Q1 (1) 49 1 48. 75 4 4 Q3 72 Page 14 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-15 Example 16 To study the distribution of the monthly salaries (月薪分佈) of the 50 employees (僱 員) of a large factory, a stem-and-leaf diagram is used. The first 45 salaries are represented in the diagram is used. The first 45 salaries are represented in the diagram below. The remaining (餘下) 5 salaries are: $4100, $16200, $7900, $9800, $7200. Stem (in $1000) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Leaf (in 100) 0 5 6 0 0 1 0 1 2 3 5 8 1 2 6 1 2 5 5 7 2 5 9 0 7 8 6 6 2 4 7 4 5 8 5 5 5 6 5 8 7 8 0 1 9 (a) Complete the stem-and-leaf diagram by adding the 5 salaries. (b) Find the median and interquartile range of the distribution of salaries. (c) Why is the mean of the salaries so different from the median? Which measure, the median, is more appropriate indicator (適當指標) of the average salary (收 入代表數) in this case? Why? (d) Suppose the salaries are increased by 20% and then by a constant amount of $100. What will be the median and the interquartile range of the new salary distribution? Page 15 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-16 Example 17 To study the distribution of the monthly expenditure (每月支出) of the 27 employees of an organization, a stem-and-leaf diagram is used. Stem (in $1000) 3 4 5 6 7 8 9 10 12 13 14 15 (a) Leaf (in 100) 4 5 7 1 1 4 5 0 5 8 9 3 6 8 2 8 2 5 2 6 5 8 7 0 6 7 9 Find the median and interquartile range of the distribution of expenditure. Position of median = 2 (27 _ 1) th = 14th 4 (27 1) = 7th 4 Q2= $6600 th Position of Q1 = Position of Q3 = 21th Q1= $4500 Q3 = $10500 Interquartile range = Q3 – Q1 = $10500 – $4500 = $6000 (b) Without finding the mean, what measure of average, the mean and the median, would have a larger value? Explain your answer briefly. Since there are many large value which makes the mean larger. The mean should have a larger value. (c) Suppose the expenditures are reduced by 20% and then by a constant amount of $300. What will be the median and the interquatile range of the new expenditure distribution? New median = ($6600)0.8 – $300 = $4980 New interquartile range = ($6000)0.8 = $4800 Page 16 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure (b) F7-MS-Ch10and 11-17 Box-and Whisker diagram 方框端線圖 A diagrammatic method of showing the characteristic features (特徵) of a set of data, including the minimum, the ,maximum, the quartiles, the median the interquartile range. It may also shows the degree of extreme (極值程序) for the data by defining inner fences and outer fences as Whisker 端線 The distance between the lower quartile (下四分位數) and the minimum value (極小 值), and also the distance between the upper quartile (上四分位數) and the maximum value (極大值) of the distribution, this lengths, to some extent, measure how the smallest and largest 25% of data distributed. Box 方框 The portion showing the position of the lower quartile, the median and the upper quartile. The length of box, indicates to some extent, how the central 50% of data distributed. Example 18 The marks scored by 11 students in an examination are as follows: 40, 53, 60, 63, 65, 66, 69, 70, 71, 77, 92 Draw a Box-and-Whisker diagram and find out whether there are any outliners. 1 (11 1) th 3 rd 4 3 Position of Q3 (12) th 9 th 4 2 Position of Q2 (12) th 6 th 4 Position of Q1 Q1=60 Q3=71 Q2=66 IQR = 71 – 60 = 11 IQR 40 Page 17 60 66 Q1 Q2 71 Q3 92 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-18 §10.5 Other examples Example 19 ---mean ,standard deviation The following are two sets of data of an experiment obtained by two different students: Student A Student B 8 7 12 6 7 7 Volume of acid measured (cm3) 9 3 10 12 11 15 12 11 9 9 12 13 14 11 (i) What is the mean volume of acid measured by each student? (ii) What is the standard deviation? Which set of results is more reliable? Example 20 ---Sample mean & sample standard derivation Two machines, A and B, are used to pack biscuits. A sample of 10 packets was taken from each machine and the mass of each packet, measured to the nearest gram, was noted. Find the standard deviation of the masses of the packets taken in the sample for each machine. Comment on your answer. Machine A 196,198,198,199,200,200,201,201,202,205 (mass in g) Machine B 192, 194, 195, 198, 200, 201, 203, 204, 206, 207 (mass in g) Page 18 F7 Mathematics and Statistics Chapter 10and 11 Basic statistical measure F7-MS-Ch10and 11-19 Example 21 The annual salaries bill of a small business company is as follows: Director $2,400,000 Manager $2,000,000 Salesman $300,000 Chief operator $150,000 3 Operators at $100,000 each 2 Secretaries at $90,000 each 1 Apprentice $80,000 Find the mode, the median and the mean salary of these 10 persons. Which of these would you regard as the best indicator of the average salary? Explain your answer. Page 19