Uploaded by MD SHAKIB MAHAMUD

Assignment 2

advertisement
Data Warehouse and Data Mining
Practice Set
1. (20 points) Given the following dataset including the records of students exam scores. Please write a
program to give out the following statistical description of data.
(a) Max, min
(b) Q1, median, Q3
(c) Mean, variance
(d) Mode
Last name
Duckey
Goof
Brave
Snow
Alice
Sleeping
Simba
Dumbo
Brown
Johny
Grump
Hap
Dop
Bash
Sleep
Sneeze
Shelly
Chelsea
Angela
Allison
First name
Donald
Goofy
Balto
White
Wonderful
Beauty
Lion
Elephant
Deer
Jocky
Grumpy
Happy
Dopey
Bashful
Sleepy
Sneezey
Malik
Tomek
Clodfelter
Nields
Test score
85
89
93
55
89
72
98
90
66
91
75
82
60
85
61
83
95
89
73
35
2. (20 points) Given the following patient record table, which contains the attributes name, test-1, test2, test-3, test-4, test-5, test-6, test-7, test-8, where name is an object identifier and the remaining
attributes are asymmetric binary. Please write a program to calculate the Jaccard coefficient (similarity)
between each pair of the three patients – Peter, Mary, and Jim.
name
Peter
Mary
Jim
test-1
Y
N
N
test-2
Y
Y
N
test-3
N
N
N
test-4
N
N
N
1
test-5
Y
Y
Y
test-6
N
N
N
test-7
N
Y
N
test-8
Y
Y
N
3. (20 points) Given two three-dimensional data points: x1 = (1, 4, 5) and x2 = (3, 7, 8), Please write a
program to calculate the dissimilarity/similarity between the two points by the following measures.
(a) Manhattan distance
(b) Euclidean distance
(c) Cosine similarity
4. (20 points) Suppose two stocks A and B have the following values in one week: (4, 6), (6, 9), (10,
11), (8, 12), (12, 15). Please write a program to calculate the Pearson’s coefficient between the two
stocks.
5. (20 points) Last year, eight randomly selected students took a math aptitude test before they began
their statistics course. In the table below, the X column shows scores on the aptitude test. Similarly, the
Y column shows statistics grades. Please answer the three questions.
(a) What linear regression equation best predicts statistics performance, based on math aptitude
scores?
(b) If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics?
(c) How well does the regression equation fit the data?
Student ID
1
2
3
4
5
6
7
8
X
78
65
88
95
85
80
70
60
2
Y
77
68
90
85
95
70
65
70
Download