1 Outline

advertisement
CHAPTER 1 OUTLINE
Population Data
Population
The numeric characteristic of all members of a specific group of individuals or things
is called population data.
 All students at IUPUI (about 29,000)
Numeric characteristics:
o Household income.
o Marital status (“married” = 1; “single” = 0).
o Work status (“full time” = 1; “part-time or not working” = 0)
o Age.
o Residency status (“in-state” = 1; “out-of-state” = 0)
o Commuting distance and time to campus.
Sample Data
A sample is a subset of the population.
The numeric characteristic of a subset of the specific group of individuals or things is
called sample data.
Sample
Population Parameter
A summary characteristic computed from population data is called a population parameter.

Population Mean (µ mu)
The mean of the population is a population parameter.
Example
Population data, X: the commuting distance of N = 88 E270 students:
µ=
∑𝑥
𝑁
∑𝑥 = 1,419
𝑁 = 88
Chapter 1—Descriptive Statistics
µ = 1,419⁄88 = 16.125
13
3
8
20
6
3
6
24
27
30
18
4
24
23
5
3
20
15
9
21
22
17
13
19
24
20
30
8
23
3
12
17
18
27
12
8
17
9
24
30
7
27
16
1
19
27
26
26
24
27
22
13
16
19
21
24
6
11
19
10
15
11
7
23
8
23
23
22
3
5
19
27
16
15
24
25
18
3
22
28
23
11
8
3
4
21
5
11
Page 1 of 8

Population Proportion (π pi)
1
0
0
1
0
0
0
1
0
0
0
The population proportion is a population parameter.
Example
The population data, X: gender of N = 88 E270 students. X is binary data: Female = 1, Male = 0
Proportion of students who are female:
𝜋=
∑𝑥
∑𝑥
𝑁
= 41
𝑁 = 88
𝜋 = 41⁄88 = 0.466 (or 46.6%)
1
1
0
1
0
0
1
1
0
0
1
0
1
0
1
0
1
1
0
0
1
1
1
1
1
0
0
1
1
1
0
0
0
0
1
0
1
1
0
1
1
1
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
1
1
0
1
0
Sample Statistic
A summary characteristic computed from sample data is called a sample statistic.
 Sample Mean x̅ (x-bar)
The mean of a sample data is a sample statistic.
Example
The commuting distance of n = 5 students randomly selected from the population of E270
students.
̅=
𝒙
9
20 10
3
26
0
1
1
0
1
0
0
0
0
0
∑𝒙
𝒏
∑𝑥 = 68

𝑛=5
𝑥̅ = 68⁄5 = 13.6
̅ (p-bar)
Sample Proportion 𝒑
The sample proportion is a sample statistic.
Example
The proportion of a sample of n = 50 IUPUI students who smoke tobacco.
“Smokes”: 𝑥 = 1; “Does not smoke”: 𝑥 = 0
∑𝑥
9
𝑝̅ =
∑𝑥 = 9 𝑝̅ = = 0.18
𝑛
50
Chapter 1—Descriptive Statistics
0
0
0
1
0
1
0
0
0
0
0
1
1
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Page 2 of 8
The Mean is the Center of Gravity of Data
The five items in your short grocery list have the following prices. Prices are rounded to the nearest dollar for ease of calculation.
Item
𝑖
1
2
3
4
5
µ=
∑𝑥
𝑁
=
Price ($)
xi
2
5
3
8
17
35
Deviation
xi − μ
-5
-2
-4
1
10
0
35
=7
5
The mean is the center of gravity of the data because
∑(𝑥𝑖 − µ) = 0
The sum of deviations from the mean is zero.
Weighted Mean
The weighted mean is used when each observation in the data set is assigned a specified weight. The weight of each data point can be either relative, or
be expressed as the frequency of that data point in the data set.
Example
An instructor assigns the following weights to various requirements in an
introduction to statistics course:
Homework assignments
Midterm test
Final
0.20 (20%)
0.30 (40%)
0.50 (50%)
A student’s scores on these requirements, from a scale of 100, are shown
below. The course letter grade is based on the average computed from
these scores. Compute the average score.
Requirement
Homework
Midterm
Final
Score
𝑥𝑖
95
84
80
Weight
𝑤𝑖
0.2
0.3
0.5
Weighted average = 𝑥𝑖 𝑤𝑖 = 84.2
Chapter 1—Descriptive Statistics
𝑥𝑖 𝑤𝑖
19.0
25.2
40.0
84.2
Page 3 of 8
Example
The distribution of test scores (scale of 100) on the midterm is as follows. Compute the average score of the 50 students who took the test.
Score
𝑥𝑖
56
64
72
76
80
84
88
92
96
100
Frequency
𝑓𝑖
1
1
3
10
12
8
6
4
2
3
50
𝑥𝑖 𝑓𝑖
56
64
216
760
960
672
528
368
192
300
4116
∑𝑥𝑖 𝑓 𝑖
= 4116
𝑁 = ∑𝑓𝑖 = 50
µ=
∑𝑥𝑖 𝑓𝑖 ∑𝑥𝑖 𝑓𝑖
=
= 82.32
𝑁
∑𝑓𝑖
Measure of Dispersion (Variability) of Data—Variance
What is “dispersion” or “variability”? Dispersion or variability of data indicates how the data points in a data set are distributed along the number line.
This distribution or dispersion is measured against a benchmark on the number line. The benchmark is the mean or the center of gravity of that data set.
The distance of each data point from the mean is called the “deviation”. For the data set X, the deviation of each observation from the mean is 𝑥𝑖 − µ𝑥 .
For the data set Y, the deviation is 𝑦𝑖 − µ𝑦 . A logical and intuitive summary measure or indicator of deviations is the average of the deviations. To find
this average you must sum the deviations and divide the sum by the number of deviations. But the sum of deviations is always equal to zero, so the
average deviations will also be equal to zero. To avoid this outcome, square the deviations and find the mean of the squared deviations. The mean of
squared deviations is the variance of the data.
Chapter 1—Descriptive Statistics
Page 4 of 8
𝑥
14
16
18
24
28
100
(𝑥 − µ)²
36
16
4
16
64
136
𝑥−µ
-6
-4
-2
4
8
0
𝑦
1
3
7
24
50
85
(𝑦 − µ)²
256
196
100
49
1089
1690
𝑦−µ
-16
-14
-10
7
33
0
Mean:
µ = ∑𝑥⁄𝑁 = 100⁄5 = 20
Mean:
µ = ∑𝑦⁄𝑁 = 85⁄5 = 17
Variance:
σ2 = ∑(𝑥 − µ)2 ⁄𝑁 = 136⁄5 = 27.2
Variance:
σ2 = ∑(𝑦 − µ)2 ⁄𝑁 = 1690⁄5 = 338
Standard deviation:
σ = √σ2 = 5.215
Standard deviation:
σ = √338 = 18.385
Variance is the average squared deviation. Standard deviation is the average deviation, computed the roundabout way.
Simpler formula to compute the variance—
The numerator of the variance formula can be calculated more simply by using the following formula:
∑(𝑥 − µ)2
𝑥
5.2
6.9
8.6
12.8
14.4
47.9
Regular method
(𝑥 − µ)²
𝑥−µ
-4.38
19.1844
-2.68
7.1824
-0.98
0.9604
3.22
10.3684
4.82
23.2324
60.9280
=
∑𝑥2 − 𝑁µ2
Simple method
𝑥
𝑥²
5.2
27.04
6.9
47.61
8.6
73.96
12.8
163.84
14.4
207.36
47.9
519.81
µ = ∑𝑥⁄𝑁 = 47.9⁄5
µ = ∑𝑥⁄𝑁 = 47.9⁄5 = 9.58
∑(𝑥 − µ)2
𝑥² = 519.81
= 60.928
σ2 = 60.928⁄5 = 12.1856
Chapter 1—Descriptive Statistics
𝑥² − 𝑁µ2
= 519.81 − 5(9.582 ) = 60.928
σ² = 60.928 ⁄5 = 12.1856
Page 5 of 8
Variance of Binary Data
The variance of binary data is computed using the simple formula:
π is the population proportion (mean of binary data).
σ² = π(1 − π)
Using the simple method for computing the numerator of the variance formula (replacing µ with π), we have,
σ2 =
∑(𝑥 − π)2
𝑁
=
∑𝑥 2 − 𝑁π2
𝑁
Since x is binary, 𝑥² =
σ2 =
π=
∑𝑥 − 𝑁π2
𝑁
∑𝑥
𝑁
=
=
∑𝑥
𝑁
𝑥
For example:
𝑥 = 1 + 1 + 0 + 0 + 0 = 2
𝑥2
=1+1+0+0+0 =2
− π2 = π − π2 = π(1 − π)
2
= 0.4
5
σ² = π(1 − π) = 0.4(0.6) = 0.24
Variance of Sample Data
Using 𝑥̅ for the mean of the sample data, the sum of squared deviations in the numerator of the variance formula is: (𝑥 − 𝑥̅)²
But to find the average squared deviation of the sample data you must divide the sum of squared deviations by “𝑛 − 1”, where 𝑛 (lower case) is the
number of data points in the sample (sample size).
The symbol for the sample variance is 𝒔².
𝑠2 =
∑(𝑥 − 𝑥̅ )2
𝑛−1
The simple formula for the numerator of the sample variance is:
(𝑥 − 𝑥̅ )2 = 𝑥² − 𝑛𝑥̅²
Chapter 1—Descriptive Statistics
Page 6 of 8
Example
Below is the fill (in ounces) of a random sample of 5 bottles of soda. Compute the mean, variance, and standard deviation of the fill.
The regular method
𝑥
16.2
15.8
15.6
16.4
15.5
∑𝑥 = 79.5
𝑥 − 𝑥̅
0.3
-0.1
-0.3
0.5
-0.4
∑(𝑥 − 𝑥̅) = 0.0
The “simple” method
(𝑥 − 𝑥̅)²
0.09
0.01
0.09
0.25
0.16
2
)
∑(𝑥 − 𝑥̅ = 0.60
𝑥̅ = 𝑥 ⁄𝑛 = 79.5⁄5 = 15.9
𝑠² =
(𝑥 − 𝑥̅)²
𝑛– 1
=
0.60
= 0.15
4
𝑠 = √0.15 = 0.387
𝑥
16.2
15.8
15.6
16.4
15.5
∑𝑥 = 79.5
∑𝑥 2 =
𝑥²
262.44
249.64
243.36
268.96
240.25
1264.65
𝑥̅ = 𝑥 ⁄𝑛 = 79.5⁄5 = 15.9
𝑥² − 𝑛𝑥̅ 2
𝑠² =
= 1264.65 − 5(15.92 ) = 0.60
0.60
= 0.15
4
𝑠 = √0.15 = 0.387
The z-score
We use the standard deviation of the data set (σ for the population data, and 𝑠 for the sample data) to transform the observations in a data set such that
they are expressed in terms of their deviation from the mean measured in units of, or relative to, the standard deviation.
𝑧𝑖 =
𝑥𝑖 − µ
σ
For example, suppose the mean of a data set is µ = 20 and the standard deviation is σ = 6, and suppose one of the observations in this data set is 𝑥𝑖 = 32.
The. the deviation of this data point from the mean is 𝑥𝑖 − µ = 32 − 20 = 12. Given σ = 6, then this observation, 𝑥𝑖 = 32, is (32 − 20)⁄6 = 2 standard
deviations from the mean. The deviation “𝑥𝑖 − µ” is expressed in units of standard deviation σ. This is the 𝑧-score.
𝑧=
32 − 20
=2
6
You can find the z score for each observation in the data set.
Chapter 1—Descriptive Statistics
Page 7 of 8
Example
Find the mean µ and standard deviation σ of the following population data set. Then transform each 𝑥𝑖 into the 𝑧-score.
𝑥𝑖
8
12
18
26
41
To find µ and σ:
𝑥
8
12
18
26
41
105
µ=
𝑥
σ² =
𝑁
To find the z-scores:
(𝑥 − µ)²
169
81
9
25
400
684
=
105
= 21
5
(𝑥 − µ)²
𝑁
684
=
= 136.80
5
𝑥
8
12
18
26
41
105
𝑧 = (𝑥 − µ)⁄σ
-1.11
-0.77
-0.26
0.43
1.71
0.00
𝑧²
1.24
0.59
0.07
0.18
2.92
5.00
The Mean and Standard Deviation of z
∑𝑧 = 0
µ𝑧 =
σ = √136.8 = 11.696
2
σ =
Chapter 1—Descriptive Statistics
𝑥−µ
-13
-9
-3
5
20
0
∑𝑧
𝑁
=0
∑(𝑧 − µ𝑧 )2
𝑁
=
∑𝑧 2
𝑁
=
5
=1
5
𝜎=1
Page 8 of 8
Download