Chapter-2- Tutorial

advertisement
Chapter 2
Tutorial
2nd & 3rd LAB
Boxplot Example 1
Draw the boxplot For the following set
1
1
3
5
6
6
7
8
12
13
15
Min=
Q1=
Median=
Q3=
Max=
IQR=
1.5×IQR=
Q1- 1.5×IQR=
Q2+1.5×IQR=
Boxplot Example 1
Draw the boxplot For the following set
1
1
3
5
6
6
7
8
12
13
15
Min=1
Q1=3
Median=6
Q3=12
Max=15
IQR=12-3=9
1.5×IQR=13.5
Q1- 1.5×IQR=-10.5
Q3+1.5×IQR=25.5
Boxplot Example 1
16
14
12
Min=1
Q1=3
Median=6
Q3=12
Max=15
10
8
6
4
2
0
Sample 1
Boxplot Example 2
Draw the boxplot For the following set
2
3
3
6
7
7
7
9
13
15
30
Min=
Q1=
Median=
Q3=
Max=
IQR=
1.5×IQR=
Q1- 1.5×IQR=
Q2+1.5×IQR=
Boxplot Example 2
Draw the boxplot For the following set
2
3
3
6
7
7
7
9
13
15
30
Min=2
Q1=3
Median=7
Q3=13
Max=30
IQR=10
1.5×IQR=15
Q1- 1.5×IQR=-12
Q2+1.5×IQR=28
Boxplot Example 2
Min=2
Q1=3
Median=7
Q3=13
Max=30
35
30
25
20
Q1- 1.5×IQR=-12
Q2+1.5×IQR=28
15
10
5
0
Sample 1
Min Outlier
Max Outlier
Boxplot Example 2
Min=2
Q1=3
Median=7
Q3=13
Max=30
35
30
25
20
Terminate whiskers at the
most extreme observation
within 1.5×IQR of the
quartiles
15
10
5
0
Sample 1
Min Outlier
Max Outlier
Q1- 1.5×IQR=-12
Q2+1.5×IQR=28
Q2
2) Suppose that the data for analysis includes the
attribute grade. The grade values for the data
tuples are:
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
(a) What is the mean of the data? What is the
median?
• Using Equation (2.3), the mean = 13.61
• The median = (13+14)/2 = 13.5
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
(b) What is the mode of the data? Comment on the
data's modality (i.e., bimodal, trimodal, etc.).
• The mode (value occurring with the greatest
frequency) of the data is 13, the mode is only one
value so it’s called unimodal.
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
(c) What is the midrange of the data?
• The midrange (average of the largest and smallest
values in the data set) of the data is:
• (20+ 4) / 2 = 12
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
(d) Can you find (roughly) the first quartile (Q1) and
the third quartile (Q3) of the data?
• The first quartile (corresponding to the 25th
percentile) of the data is: 12. The third quartile
(corresponding to the 75th percentile) of the data
is: 17.
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
(e) Give the five-number summary of the data.
• The five number summary of a distribution consists of
the minimum value, first quartile, median value,
third quartile, and maximum value. It provides a
good summary of the shape of the distribution and
for this data is: 4,12,13.5,17,20
Q2
4, 5, 9, 11, 12, 13, 13, 13, 13, 14, 15, 15, 16, 17, 18, 18, 19, 20
(f) Show a boxplot of the data.
25
20
15
10
5
0
Sample 1
Min Outlier
Max Outlier
Q3
3) Suppose that the data for analysis includes the
attribute age. The age values for the data tuples
are
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(a) What is the mean of the data? What is the
median?
• Using (Equation 2.1), the (arithmetic) mean of the
data is: = 809/27 = 30. The median (middle value of
the ordered set, as the number of values in the set is
odd) of the data is: 25.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(b) What is the mode of the data? Comment on the
data's modality (i.e., bimodal, trimodal, etc.).
• This data set has two values that occur with the
same highest frequency and is, therefore, bimodal.
• The modes (values occurring with the greatest
frequency) of the data are 25 and 35.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(c) What is the midrange of the data?
• The midrange (average of the largest and smallest
values in the data set) of the data is: (70+13)=2 =
41.5
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(d) Can you find (roughly) the first quartile (Q1) and
the third quartile (Q3) of the data?
• The first quartile (corresponding to the 25th
percentile) of the data is: 20. The third quartile
(corresponding to the 75th percentile) of the data
is: 35.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(e) Give the five-number summary of the data.
• The five number summary of a distribution consists
of the minimum value, first quartile, median value,
• third quartile, and maximum value. It provides a
good summary of the shape of the distribution and
for this data is: 13, 20, 25, 35, 70.
Q3
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36,
40, 45,46, 52, 70.
(f) Show a boxplot of the data.
Q4
4) Suppose a manager tested the age and body fat
data for 18 randomly selected adults with the
following result
23
23
27
27
39
41
47
49
50
52
54
54
56
57
58
58
60
61
9.5
26.5
7.8
17.8
31.4
25.9
27.4
27.2
31.2
34.6
42.5
28.8
33.4
30.2
34.1
32.9
41.2
35.7
Q4
(a) Calculate the mean, median and standard
deviation of age and score.
• For the variable age the mean is 46.44, the median
is 51, and the standard deviation is 12.85.
• For the variable score the mean is 28.78, the
median is 30.7, and the standard deviation is 8.99
Q4
(b) Draw the box-plots for age and score.
7.8
9.5
17.8
25.9
26.5
27.2
27.4
28.8
30.2
31.2
31.4
32.9
33.4
34.1
34.6
35.7
41.2
42.5
Q4
(b) Draw the box-plots for age and score.
23
23
27
27
39
41
47
49
50
52
54
54
56
57
58
58
60
61
Q4
(c) Draw a scatter plot and a q-q plot based on these
two variables.
Q4
(c) Draw a scatter plot based on these two variables.
23
23
27
27
39
41
47
49
50
52
54
54
56
57
58
58
60
61
9.5
26.5
7.8
17.8
31.4
25.9
27.4
27.2
31.2
34.6
42.5
28.8
33.4
30.2
34.1
32.9
41.2
35.7
Q4
(c) q-q plot based on these two variables.
23
23
27
27
39
41
47
49
50
52
54
54
56
57
58
58
60
61
7.8
9.5
17.8
25.9
26.5
27.2
27.4
28.8
30.2
31.2
31.4
32.9
33.4
34.1
34.6
35.7
41.2
42.5
Q5
Given two objects represented by the tuples (22, 1,
42, 10) and (20, 0, 36, 8):
• (a) Compute the Euclidean distance between the
two objects.
• (b) Compute the Manhattan distance between the
two objects.
• (c) Compute the Minkowski distance between the
two objects, using h = 3.
•
Q5
• To compute distance between Numeric
attributes
• Euclidean distance
• The Manhattan (or city block) distance
=
Q5
• (22, 1, 42, 10) and (20, 0, 36, 8):
(a) Compute the Euclidean distance between the
two objects.
=
=
=6.7082
Q5
• (22, 1, 42, 10) and (20, 0, 36, 8):
(b) Compute the Manhattan distance between the
two objects
=
= 11
Q5
• (22, 1, 42, 10) and (20, 0, 36, 8):
(c) Compute the Minkowski distance between the
two objects, using h = 3
= 6.1534
Download