Data Warehouse and Data Mining Practice Set 1. (20 points) Given the following dataset including the records of students exam scores. Please write a program to give out the following statistical description of data. (a) Max, min (b) Q1, median, Q3 (c) Mean, variance (d) Mode Last name Duckey Goof Brave Snow Alice Sleeping Simba Dumbo Brown Johny Grump Hap Dop Bash Sleep Sneeze Shelly Chelsea Angela Allison First name Donald Goofy Balto White Wonderful Beauty Lion Elephant Deer Jocky Grumpy Happy Dopey Bashful Sleepy Sneezey Malik Tomek Clodfelter Nields Test score 85 89 93 55 89 72 98 90 66 91 75 82 60 85 61 83 95 89 73 35 2. (20 points) Given the following patient record table, which contains the attributes name, test-1, test2, test-3, test-4, test-5, test-6, test-7, test-8, where name is an object identifier and the remaining attributes are asymmetric binary. Please write a program to calculate the Jaccard coefficient (similarity) between each pair of the three patients – Peter, Mary, and Jim. name Peter Mary Jim test-1 Y N N test-2 Y Y N test-3 N N N test-4 N N N 1 test-5 Y Y Y test-6 N N N test-7 N Y N test-8 Y Y N 3. (20 points) Given two three-dimensional data points: x1 = (1, 4, 5) and x2 = (3, 7, 8), Please write a program to calculate the dissimilarity/similarity between the two points by the following measures. (a) Manhattan distance (b) Euclidean distance (c) Cosine similarity 4. (20 points) Suppose two stocks A and B have the following values in one week: (4, 6), (6, 9), (10, 11), (8, 12), (12, 15). Please write a program to calculate the Pearson’s coefficient between the two stocks. 5. (20 points) Last year, eight randomly selected students took a math aptitude test before they began their statistics course. In the table below, the X column shows scores on the aptitude test. Similarly, the Y column shows statistics grades. Please answer the three questions. (a) What linear regression equation best predicts statistics performance, based on math aptitude scores? (b) If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics? (c) How well does the regression equation fit the data? Student ID 1 2 3 4 5 6 7 8 X 78 65 88 95 85 80 70 60 2 Y 77 68 90 85 95 70 65 70