9 Basic statistical measure 基本統計學的量度

advertisement
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-1
10 Basic statistical measure 基本統計學的量度
§10.0.1
Basic Definition of statistical measure
(1)
Population Parameters and Sample Statistics Population
(2)
Simple Random Sampling
(3)
Population Size
-------------------------------------------------------------------------------------------------------(1)
Population Parameters and Sample Statistics Population
A population is a collection of all possible observations that are of interest
in a particular study.
*Population Parameters
For an example, mean, variance, deviation, mode and median are
population parameters.
Sample
The part of observation that are actually collected is called a sample of the
population.
*Sample statistics
For an example, sample mean, sample variance and sample median are
sample statistics.
Statistical Inference
Estimating a population parameter by using the corresponding sample
statistics is one aspect of statistical inference.
Sampling units or observational units
“population” refers to the set of data (observations, measurements, etc). For
an example, “population” may refer to the set of data that represents the weight
of the observed objects. Sampling units or observational units are units on which
observations are made.
(2)
(3)
Page 1
Simple Random Sampling
“RA N#” key
construct a random number table like that
0.871
0.843 0.874 0.237 0.451
0.583
0.201 0.199 0.565 0.298
0.532
0.932 0.508 0.710 0.900
0.561
0.119 0.206 0.364 0.814
0.770
0.830
0.661
0.366
0.962
0.727
0.481
0.964
*Population Size(n  10)
[Notice: Size of a population need not be large. ]
0.980
0.690
0.484
0.703
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
§10.0.2
F7-MS-Ch10and 11-2
Discrete and Continuous Variables
(i) Discrete Variable
(ii) Continuous Variable
(i)
Discrete Variable
There are the marks obtained by 30 pupils in a test:
6 3 5 9 0 1 8 5 6 7 4 4 3 1 0
2 2 7 10 9 7 5 4 6 6 2 1 0 8 8
Discrete data, for example,
the number of cars passing a checkpoint in a certain time,
the shoe sizes of children in a class,
the number of tomatoes on each of the plants in a greenhouse.
(ii)
Continuous Variable
These are the heights of 20 children in a school. The heights have been measured
correct to the nearest cm.
133
131
130
134
136
127
131
135
120
141
125
137
138
127
144
133
133
143
128
129
For example
144 cm ( correct to the nearest cm) could have arisen from any value in the
range 143.5cm  h < 144.5 cm.
Other examples of continuous data are
the speed of vehicles passing a particular point,
the masses of cooking apples from a tree,
the time taken by each of a class of children to perform a task.
**Continuous data cannot assume exact value, but can be given only within a certain
range or measured to a certain degree of accuracy,**
Frequency polygons and curves are particularly useful for showing the
general shapes of frequency distributions. The above figures shows some
examples of continuous distributions with different typical shapes.
Page 2
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-3
§10.1 Grouped and ungrouped data 分組與不分組數據
Grouped data 分組數據
Data with different values are classified under the same heading (不同數值數據擺放
在同一標題). The reason for grouped data is that data are with great variety (太多元
化) or there are too many data (太多數據) to deal with. However, when data are
grouped, the original information before grouping is lost (失去分組前原來資料).
Therefore, we have to estimate (估計) for example the mean (平均值) and standard
deviation (標準差) after grouping. Hence, error (誤差) may result, and this is called
loss of information.
Example 1
The table below shows the distribution of presence (出席人數分佈) of 7A of a certain
school in a certain month:
No. of presence(x)
No. of days(f)
19
1
20
2
21
3
22
3
23
6
24
5
25
3
26
2
We may change it to grouped data by coming the consecutive values (組合相鄰數值)
as following:
No. of presence(x)
Page 3
No. of days(f)
19 – 20
3
21 – 22
6
23 – 24
11
25 – 26
5
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-4
In ungrouped data (不分組數據), then mean number of presence per day (平均每天
出席人數)

19  2(20)  3(21)  3(22)  6(23)  5(24)  3(25)  2(26)
25
=
In grouped data (分組數據), the mean number of presence

3(19.5)  6(21.5)  11(23.5)  5(25.5)
25
=
Hence, the mean calculated from ungrouped data and grouped data are not necessarily
the same.
§10.2 Measure of central tendency 集中趨勢的量度
The measure of central tendency is a value which may represent the whole set of data
(代表整體數據) and has the tendency to lie centrally within the whole set of data,
when arrange in order of magnitude (並有趨向在數據中心位置, 當依著數據排列).
Common types of measure of central tendency 常見集中趨勢量度
§10.2.1
(a)
Arithmetic Mean 平均值
the mean of all data
(b)
Median 中立數
the middle value (中間位置數值) of the data when arrange in order of magnitude (依
著數據大小排列), i.e. the (n+1)th data in order of magnitude, where n is the total
frequency (總頻數).
(c)
Mode 眾數
the data with the greatest occurrence (出現次數最多).
Page 4
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-5
Example 6
For the data: 3, 4, 6, 7, 7, 10, 13, 14, 16, 17, 20, find the interquartile range.
n = 11
1
(11+1)th = 3rd
4
2
position of Q2 = (11+1)th = 6th
4
3
position of Q 3= (11+1)th = 9th
4
The interquartile range
IQR = Q3 – Q1 = 16 – 6 =10
Range = 20 – 3 =17
position of Q1 =
Q1= 6
Q2= 10
Q 3= 16
Example 7
The number of absence (缺席人數) of a certain class for 12 consecutive days is as
follows:
0, 3, 4, 7, 1, 9, 2, 11, 1, 2, 5, 0. Determine the interquartile range.
0, 0, 1,|1, 2, 2,| 3, 4, 5,| 7, 9, 11
n = 12
1
1
position of Q1 = (12+1)th = 3 th
4
4
2
1
position of Q2 = (12+1)th = 6 th
4
2
3
3
position of Q3 = (12+1)th = 9 th
4
4
Q1 = 1
Q2 = 2.5
Q3= 5 +
75
 3 = 5 + 1.5 = 6.5
4
∴ The interquartile range = 6.5 – 1 = 5.5
(c)
Percentile 百分位數
i
(n  1) th item in ascending order of magnitude,
100
……
where i = 1, 2, 3
99 and n is the total frequency.
the ith percentile of a distribution =
Example 8
The distribution of the number of apples in 199 boxes is as follows:
Number of apples
100
101
102
103
104
105
Find from the table,
Page 5
Number of box
16
24
47
76
30
6
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
(c)
F7-MS-Ch10and 11-6
(a)
the interquartile range
(b)
the median
(c)
the 20th percentile
(d)
the 84th percentile
Standard deviation 標準差
Given n data x1, x2, x3 …… xn,
( x1  x ) 2  ( x 2  x ) 2  ( x3  x ) 2  ......  ( x n  x ) 2
n
then standard deviation  
x1  x2  x3  ......  xn
 x2
n
2

where x 
2
2
2
x1  x2  x3  ......  xn
n
Remark
varinace 方差
Page 6
( x1  x ) 2  ( x2  x ) 2  ( x3  x ) 2  ......  ( x n  x ) 2
 
n
2
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-7
Example 9
Find the standard deviation for the two set of data:
(a)
Data A: 2, 4, 5, 6, 8.
x 5

(2  5) 2  (4  5) 2  (5  5) 2  (6  5) 2  (8  5) 2
5
=2
(b)
Data B: 3, 5, 5, 8, 9.
x 6

(3  6) 2  (5  6) 2  (5  6) 2  (8  6) 2  (9  6) 2
5
=2.19
Which set of data is more concentrate using the criteria (準則) of standard deviation?
Data A is concentrate the criteria of standard deviation.
§10.3 Determination of measure of dispersion using grouped data
分組數據求離差的量度
(a)
Quartile and interquartile range
the ith quartile is the value below which 25I% of total data lies, where i = 1, 2,
3.
(b)
Percentile
the ith percentile is the value below which i% of total data lies, where i = 1, 2,
3……99.
Page 7
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
Example 10
The distribution of lengths of 100 rods are as followings:
Length of rods (cm)
10 – 14
15 – 19
20 – 24
Number of rods
5
15
50
F7-MS-Ch10and 11-8
25 – 29
20
30 – 34
10
(a)
Construct a cumulative frequency polygon (作一累積頻數多邊形) for the
data, and estimate from the graph,
(b)
the median and interquartile range of length of rods.
(c)
the range of length (長度範圍) within which the central 20% of data lies.
(d)
estimate the mean length of the rods.
Page 8
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-9
Example 11
The cumulative distribution (累積頻數分佈) of weight of a number of pigs are as
follows:
Weight less than (kg) 110
120
130
140
150
160
170
Number of pigs
0
18
68
140
180
192
200
(a)
Construct a frequency distribution for the data and hence estimate the mean
and standard deviation of weights of pigs.
(b)
Draw a cumulative frequency polygon for the distribution of weights and
hence estimate.
(i)
the interquartile range and median weight of pigs.
(ii)
the modal class 眾數組
(iii)
the range of weights within which the central 30% of pigs lie.
Page 9
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-10
Example 12
The following table shows the distribution of time a particular group of students need
to solve a given puzzle (填字遊戲), correct to the nearest seconds (以最接近秒為準):
Time (sec)
Number of students
10 – 14
5
15 – 19
8
20 – 24
12
25 – 29
38
(a)
Complete the table.
(b)
Draw a cumulative frequency polygon for the distribution.
Page 10
30 – 34
10
35 – 39
7
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-11
From your graph, estimate
(c)
the median and the interquartile range of time.
(d)
the number of students with time between 22 and 28 seconds.
(e)
the mean and standard deviation of time without using the graph.
Example 13
The figure shows the cumulative frequency polygon of weight in kg for a group of
100 students.
Page 11
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-12
(a)
Use the graph paper provided to draw a histogram (矩形圖) of the weights.
(b)
Determine the interquartile range of the weight from the cumulative frequency
polygon.
(c)
Determine the mean weight from the histogram.
Page 12
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-13
§10.4 Graphical representation of data 數據的圖像表示法
(a)
Stem and leaf plot 幹葉圖
A graphical method of presenting data in a sorting manner (分類) by listing (列出)
the data in order form (依大小次序). The leading digits are called stem (幹) and the
trailing digits are called leaf (葉).
Example 14
The number of hours spent by 25 students in studying in mathematics test is shown as
follows:
11 9
25 21 18 25
9
32 29 19
19 19 22 12
6
30 19 15 19 42
25 10 19 25 12
(a)
Copy and complete the following stem-and-leaf diagram for the above data:
Stem (in 10) Leaf (in 1)
0 6
9
9
1 0
1
2
2
3
4
5
(b)
Page 13
2
5
8
9
9
9
9
9
9
Find the mode, the median and the interquartile range of the numbers of hours
spent by the 25 students in the mathematics test.
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-14
Example 15
A stem-and-leaf diagram for the test scores in mathematics of 30 students is shown
below:
Stem (tens)
1
2
3
4
5
6
7
8
9
(a)
Leaf (digits)
0
0
3
0
1
2
4
8
2
4
2
3
2
6
5
2
3
2
8
6
5
3
9
7
7
4
8
9
4
8
9
Find the mean, the median and interquartile range of these scores.
mean = 59.7667 (to 4d.p.)
30  1
position of median =
= 15th
2
th
th
(30  1)
3
7
position of Q1 
4
4
th
th
(30  1)3
1
 23
position of Q3 
4
4
Q1= (59+61)  2 = 60
Q2  50 
50  49
1  49.75
4
Q3 = 72
Interquartile range = 72 – 49.75 = 22.25
(b)
If the score 73 is an incorrect record and the correct score is 43, which of the
statistics will have different values? Find the correct values of these statistics.
mean = 58.7667 (to 4 d.p.)
median = (58+59)  2 = 58.5
49  48
49  48
Q1 
(1)  49 
 1  48. 75
4
4
Q3  72
Page 14
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-15
Example 16
To study the distribution of the monthly salaries (月薪分佈) of the 50 employees (僱
員) of a large factory, a stem-and-leaf diagram is used. The first 45 salaries are
represented in the diagram is used. The first 45 salaries are represented in the diagram
below. The remaining (餘下) 5 salaries are: $4100, $16200, $7900, $9800, $7200.
Stem (in $1000)
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Leaf (in 100)
0 5
6
0 0
1
0 1
2
3 5
8
1 2
6
1 2
5
5 7
2 5
9
0
7 8
6
6
2
4
7
4
5
8
5
5
5
6
5
8
7
8
0
1
9
(a)
Complete the stem-and-leaf diagram by adding the 5 salaries.
(b)
Find the median and interquartile range of the distribution of salaries.
(c)
Why is the mean of the salaries so different from the median? Which measure,
the median, is more appropriate indicator (適當指標) of the average salary (收
入代表數) in this case? Why?
(d)
Suppose the salaries are increased by 20% and then by a constant amount of
$100. What will be the median and the interquartile range of the new salary
distribution?
Page 15
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-16
Example 17
To study the distribution of the monthly expenditure (每月支出) of the 27 employees
of an organization, a stem-and-leaf diagram is used.
Stem (in $1000)
3
4
5
6
7
8
9
10
12
13
14
15
(a)
Leaf (in 100)
4 5
7
1 1
4
5
0 5
8
9
3 6
8
2 8
2 5
2
6
5
8
7
0
6
7
9
Find the median and interquartile range of the distribution of expenditure.
Position of median =
2
(27 _ 1) th = 14th
4
(27  1)
= 7th
4
Q2= $6600
th
Position of Q1 =
Position of Q3 = 21th
Q1= $4500
Q3 = $10500
Interquartile range = Q3 – Q1 = $10500 – $4500
= $6000
(b)
Without finding the mean, what measure of average, the mean and the median,
would have a larger value? Explain your answer briefly.
Since there are many large value which makes the mean larger.

The mean should have a larger value.
(c)
Suppose the expenditures are reduced by 20% and then by a constant amount
of $300. What will be the median and the interquatile range of the new
expenditure distribution?
New median = ($6600)0.8 – $300 = $4980
New interquartile range = ($6000)0.8
= $4800
Page 16
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
(b)
F7-MS-Ch10and 11-17
Box-and Whisker diagram 方框端線圖
A diagrammatic method of showing the characteristic features (特徵) of a set of data,
including the minimum, the ,maximum, the quartiles, the median the interquartile
range. It may also shows the degree of extreme (極值程序) for the data by defining
inner fences and outer fences as
Whisker 端線
The distance between the lower quartile (下四分位數) and the minimum value (極小
值), and also the distance between the upper quartile (上四分位數) and the maximum
value (極大值) of the distribution, this lengths, to some extent, measure how the
smallest and largest 25% of data distributed.
Box 方框
The portion showing the position of the lower quartile, the median and the upper
quartile. The length of box, indicates to some extent, how the central 50% of data
distributed.
Example 18
The marks scored by 11 students in an examination are as follows:
40, 53, 60, 63, 65, 66, 69, 70, 71, 77, 92
Draw a Box-and-Whisker diagram and find out whether there are any outliners.
1
(11  1) th  3 rd
4
3
Position of Q3  (12) th  9 th
4
2
Position of Q2  (12) th  6 th
4
Position of Q1 
Q1=60
Q3=71
Q2=66
IQR = 71 – 60 = 11
IQR
40
Page 17
60
66
Q1
Q2
71
Q3
92
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-18
§10.5 Other examples
Example 19
---mean ,standard deviation
The following are two sets of data of an experiment obtained by two different students:
Student A
Student B
8
7
12
6
7
7
Volume of acid measured (cm3)
9
3
10
12
11
15
12
11
9
9
12
13
14
11
(i) What is the mean volume of acid measured by each student?
(ii) What is the standard deviation?
Which set of results is more reliable?
Example 20
---Sample mean & sample standard derivation
Two machines, A and B, are used to pack biscuits. A sample of 10
packets was taken from each machine and the mass of each packet,
measured to the nearest gram, was noted. Find the standard deviation of the
masses of the packets taken in the sample for each machine. Comment on
your answer.
Machine A 196,198,198,199,200,200,201,201,202,205
(mass in g)
Machine B 192, 194, 195, 198, 200, 201, 203, 204, 206, 207
(mass in g)
Page 18
F7 Mathematics and Statistics
Chapter 10and 11 Basic statistical measure
F7-MS-Ch10and 11-19
Example 21
The annual salaries bill of a small business company is as follows:
Director
$2,400,000
Manager
$2,000,000
Salesman
$300,000
Chief operator $150,000
3 Operators at $100,000 each
2 Secretaries at $90,000 each
1 Apprentice
$80,000
Find the mode, the median and the mean salary of these 10 persons.
Which of these would you regard as the best indicator of the average
salary? Explain your answer.
Page 19
Download