Review 1 2003

advertisement
Review 1
2004.11.10
Chapter 1:
1. Elements, Variable, and Observations:
2. Type of Data: Qualitative Data and Quantitative Data
(a) Qualitative data may be nonnumeric or numeric.
(b) Quantitative data are always numeric.
(c) Arithmetic operations are only meaningful with quantitative data.
Chapter 2: Figure 2.22, p. 66.
1. Summarizing qualitative data:
 Frequency distribution, relative frequency distribution, and
percent frequency distribution.
 Bar plot and Pie plot.
2. Summarizing quantitative data:
 Frequency distribution, relative frequency distribution, percent
frequency distribution, cumulative frequency distribution,
cumulative relative frequency distribution, cumulative percent
frequency distribution
 Histogram, Ogive, and stem-and leaf display.
Chapters 3
Measures of Location, Dispersion, Exploratory Data Analysis,
Measure of Relative Location, Weighted and Grouped Mean and
Variance
Chapter 4:
 Tabular and Graphical Methods: Crosstabulation (qualitative
and quantitative data) and Scatter Diagram (only quantitative
data).
 Numerical Method: Covariance and Correlation Coefficient.
Example 1 (Chapter 1)
A magazine surveyed a sample of its subscribers. Some of the responses from the
survey are shown below.
Subscriber ID
Gender
Age
Income ($1000)
0006
F
22
45
4798
M
21
53
2291
F
33
82
4988
M
38
30
(a) How many elements are in the data set? Write them down.
1
(b) How many variables are in the data set? Write them down.
(c) How many observations are in the data set? Write them down.
(d) Which of the above (Sex, Age, Annual Household Income) are qualitative and
which are quantitative?
(e) Are the data time series or cross-sectional?
[solution:]
(a) 4 elements, subscribers: 0006, 4798, 2291, and 4988.
(b) 3 variables, Gender, Age, and Income.
(c) 4 observations, (F, 22, 45), (M, 21, 53), (F, 33, 82) and (M, 38, 30).
(d) Quantitative: Age and Income; Qualitative: Gender.
(e) The data are cross-sectional.
Example 2 (Chapter 2)
Consider the sample data in the following frequency distribution.
Class
Frequency
3-7
4
8-12
7
13-17
9
18-22
5
Summarize the data by constructing:
(a) a relative frequency distribution, a cumulative relative frequency distribution,
a cumulative percent frequency distribution.
(b) a histogram and an ogive.
(c) the mean, the standard deviation and the coefficient of variation for the
grouped data.
[solution:]
(a)
Class
Relative
Frequency
Cumulative
Relative
Frequency
Cumulative
Percent
Frequency
3-7
0.16
0.16
16
8-12
0.28
0.44
44
13-17
0.36
0.8
80
18-22
0.2
1
100
(c)
xg 
5  4  10  7  15  9  20  5
 13 .
25
2
Since
s
2
g
2
2
2
2

5  13  4  10  13  7  15  13  9  20  13  5

 25
25  1
sg  25  5
and
c.v. 
sg
xg
 100  38.46 .
Example 3 (Chapter 3):
For the following data, 2, 1, 0, 2, 0, 2, 1, 2, 0, 2, 1, 2,
(a) Compute the mean
(b) The standard deviation.
(c) The coefficient of variation.
(d) The (100/3)th percentile.
(e) The 82th percentile
(f) The mode.
(g) The interquartile range.
(h) The five number summary for the data.
(i) The box plot.
(j) Determine the outlier.
[solution:]
(a)
12
x
x
i 1
12
i

2 11 2
 1.25
12
(b)
12
s
 x
i 1
i
 x
12  1
2

2  1.252  1  1.252    1  1.252  2  1.252
11
 0.866
(c)
C.V . 
(d) 1. The data are
0
0
0
1
2. The index is
1
s
0.866
 100 
 100  69.28 .
x
1.25
1
2
3
2
2
2
2
2
12 
100
3 4
100
Thus,
11
1
2
is the (100/3)th percentile.
(e) The index is
12 
82
 9.84
100
Thus, the 10’th data in (d), 2, is the 82th percentile.
(f) The mode is 2.
(g) Since
Q1 
0 1
22
 0.5, Q3 
 2,
2
2
IQR  Q3  Q1  2  0.5  1.5 .
(h)
Minimum
Q1
Q2
Q3
Maximum
0
0.5
1.5
2
2
Example 4 (Chapter 3):
Suppose we have the following data:
Rent
420-439
440-459
460-479
480-499
500-519
Frequency
8
17
12
8
7
Rent
520-539
540-559
560-579
580-599
600-619
Frequency
4
2
4
2
6
What are the mean rent and the sample variance for the rent?
[solution:]
10
xg 
fM
i 1
i
70
i
, where f i is the frequency of class i M i is the midpoint of class i
and n is the sample size. Then,
Rent
fi
420-439
440-459
460-479
480-499
500-519
8
17
12
8
7
4
Mi
429.5
449.5
469.5
489.5
509.5
Rent
fi
520-539
540-559
560-579
580-599
600-619
4
2
4
2
6
Mi
529.5
549.5
569.5
589.5
609.5
fM
 34525
Thus,
10
i
i 1
i
and
xg 
34525
 493.21 .
70
For the sample variance,
 f M
10
s g2 
i 1
i
 xg 
2
i
70  1

208234.287
 3017.89
69
Example 5 (Chapter 3)
(a) Consider a sample with data values of 10, 20, 12, 17 and 16. Compute the
z-score for each of the five observations.
(b) Suppose the data have a bell-shaped distribution with a mean of 20 and a
standard deviation of 5. Use both Chebyshev’s theorem and the empirical
rule to determine the percentage of data within the range 10-30.
[solution:]
(a) Since x 
s
10  20  12  17  16
 15 and
5
10  152  20  152  12  152  17  152  16  152
5 1

64
 4,
4
x  x 12  15
x1  x 10  15
x  x 20  15

 1.25, z 2  2

 1.25, z 3  3

 0.75,
s
4
s
4
s
4
x  x 16  15
x  x 17  15
z4  4

 0.5, z 5  5

 0.25
s
4
s
4
z1 
(b)
[10,30]  20  10  x  2s
Thus, by Chebyshev’s theorem, within 2 standard deviation, there is at least
5
1 

1  2   100%  75%
 2 
By empirical rule, there are approximately 95% of the data values will be within
this interval.
Example 6 (Chapter 4)
(a) The following data are for 30 observations on two qualitative variables x and
y. The categories for x are A, B, and C; the categories for y are 1 and 2.
Observation
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
x
A
B
B
C
B
C
B
C
A
B
A
B
C
C
C
y
1
1
1
2
1
2
1
2
1
1
1
1
2
2
2
Observation 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
x
B
C
B
C
B
C
B
C
A
B
C
C
A
B
B
y
2
1
1
1
1
2
1
2
1
1
2
2
1
1
2
(i) Develop a crosstabulation for the data, with x in columns and y in rows.
(ii) What is the relationship, if any, between x and y?
(b) For the following data,
X
2
4
6
8
10
Y
-5
-7
-9
-11
-13
Compute and interpret the sample covariance and the sample correlation
coefficient.
[solution:]
(a)
(i)
Category
A
B
C
Total
1
5
11
2
18
2
0
2
10
12
Total
5
13
12
30
(ii)
Category A values for x are always associated with category 1 values for y.
Category B values for x are usually associated with category 1 values for y.
Category C values for x are usually associated with category 2 values for y.
(b) Since
x  6, y  9 ,
6
n
s xy 
 x  x  y  y 
i 1
i
i
n 1
2  6 5  9  4  6 7  9  6  6 9  9  8  6 11  9  10  6 13  9

5 1
 10
Also, since
n
s
2
x
 x  x  2  6  4  6  6  6  8  6  10  6


i 1
2
2
i
2
n 1
2
2
2
5 1
 10
and
n
s
2
y
  y  y   5  9   7  9   9  9   11  9   13  9


i 1
2
i
2
2
2
n 1
5 1
2
2
 10 ,
thus
rxy 
s xy
2 2
x y

ss
 10
10 10
 1
The covariance indicates the two variables are negatively correlated. Further, the
correlation function indicates there is perfectly linear correlation between the
two variables with negative slope.
7
Download