MIS 2502 Section 03 Final Exam Practice May 10th 2011, Speakman Hall 114 10:30 am – 11:30 am PLEASE DO NOT TURN THE PAGES UNTIL INSTRUCTED BY THE PROFESSOR. Please read the following instructions carefully before you start the exam. You will have 60 minutes to complete this exam. This exam has 6 pages (including the cover) and is worth 100 points. The amount of points for each question is indicated next to the question number. Budget your time accordingly. Space is provided to answer each of the questions. If you need more space, please use the back and number your answers. Be sure to write legibly – I cannot grade an exam that I can’t read. Good luck! TUID: Name: 1 Please try your best to practice it. If you have finished all labs and homework, I hope your score is higher than 85. I. Discussing shortly (no more than 30 words each) whether or not each of the following activities is a data mining task. No matter yes or no, name the task with your knowledge. (8 x 4 = 32 points total) Grading: If “yes” or “no” is correct, give 4 points. Take 2 points off at most if the answer is not blank. 1. Computing the revenue of a company Answer: No. This is just a common calculation. Data mining is aiming to explore underlying patterns that are not obvious. 2. Sorting a customer database based on customer names Answer: No, this is just a sorting task which can be done with SQL. 3. Predicting the future stock price of a company using historical records Answer: Yes. Regression can do that and is a typical data mining task. 4. Clustering Temple students by their registered department names Answer: No. Department names are taken as given which is not underlying 5. Clustering Temple students into several groups by analyzing course GPA Yes. 6. Testing IQ for each student No. That’s just an examination, a way to collect data. 7. Predicting the music preference of a user based on his or her favorite songs Yes. 8. Using ERD theory to establish a data warehouse for a company No. That’s ERD and data warehouse. 2 II. Multiple Choices Questions. Each has only one answer. (5 x 4 = 20 points total) B_____1. The software that we use for data mining practice is developed by: A. SPSS B. SAS C. Oracle D. Microsoft C_____2. Microsoft usually clusters all potential customers into three groups: home users, professional users, and enterprise users. For each product, Microsoft will customize it into three different versions based on the needs of each user group. However Apple prefers providing just one universal version since they don’t cluster their customers in needs. From clustering view of data mining, which clustering strategy is dominantly better? A. Microsoft B. Apple C. Hard to say C_____3. OLAP (On-Line Analytical Processing) are a set of multidimensional data analysis techniques. Which of following is not an OLAP technique? A. Slice B. Dice C. Cluster D. Pivot B_____4. Which of following questions could NOT be answered with OLAP? A. what are the total sales for each product? B. how much salary did each employee receive for each month? C. which salesperson has sold the most? D. which product did each salesperson sells most? C_____5. Which of following statements about regression is NOT correct? A. R square measures how much variance can be explained by a model B. if the p value of a factor is smaller than 0.1, this factor has significant influence on the target C. Predictors of a regression model can be correlated with each other 3 D. Estimate coefficients are meaningless if the according p values are larger than 0.1 III. Data Analytics (48 points) A marketing manager is in charge of promoting their dungaree jeans which have four different styles: leisure, stretch, fashion, and original. 1. (10 points) The manager first did a clustering and segmentation analysis in order to understand the sales features of different stores and make different marketing strategies. In the output, he saw following information: Please use your own words to explain the feature of this segment. Answer: contains stores selling a higher-than-average number of stretch jeans. Grading: if the answer is close, give full points. 2. (10 points) Understanding customer purchase behavior is essential, so the manager also analyzed the historical transaction records to identify the underlying association rules. Finally he found this information in output: Rule 1: Original Leisure (lift value = 6.0) Rule 2: Leisure Original (lift value = 5.2) What can you get from above output? Answer: customers purchase original style is also likely to purchase leisure style, and vice versa. So it’s better for stores to put these two styles next to each other. Grading: if the answer is close, give full points. 4 3. (20 points) The manager has planned and implemented several marketing plans. After a year, he decides to evaluate his marketing plans based on their operational data with regression. In this regression, the dependent variable and independent variables (predictors) are listed in table 1. Variable Name MarketShare DirectMail Internet PrintMedia TVRadio Table 1. Regression Variables 1 Unit Description 1% Weekly market share that this company is taking. $1.00 Weekly spend on Direct Mail advertisement $1.00 Weekly spend on Internet advertisement $1.00 Weekly spend on Print Media advertisement $1.00 Weekly spend on TV or Radio advertisement Role Target Input Input Input Input Regression Output: Model Fit Statistics R-Square 0.5267 Adj R-Sq AIC -586.7872 BIC SBC -574.4005 C(p) 0.5038 -584.1920 5.0000 Analysis of Maximum Likelihood Estimates Standard Parameter DF Intercept 1 DirectMail 1 Internet 1 PrintMedia TVRadio Estimate Error t Value P value 0.7547 0.0153 49.23 <.0001 -0.00011 0.000049 -2.25 0.0272 0.00025 0.000037 -6.98 <.0001 1 -0.00049 0.000092 -5.34 <.0001 1 0.000026 0.000019 1.35 0.1820 Please answer following questions based on above output: a. How well is this regression model? Answer: R square is 52.67%. That means 52.67% variance can be explained by this model. Grading: if the answer is close, give full points. b. Based on the output, which marketing strategy is the best? Answer: Internet, because it’s the only strategy that can help to increase the market share. 5 Grading: if the answer is close, give full points. c. How to explain the estimate coefficient of Internet? Answer: estimate coefficent is 0.00025. By investing every $1.00 to internet advertisement, the market share will increase 1*0.00025=0.25%. Grading: if the answer is close, give full points. d. If you are the marketing manager, what actions will you take based on above regression result? Answer: Cancel all other marketing plans and only invest on internet advertisement. Grading: if the answer is close, give full points. 4. (8 points) In practice, tools using collective intelligence have performed better than theorists can explain, especially in IT industry. Can this manager use collective intelligence to improve his marketing performance? If yes, please design a detailed strategy to explain the power of collective intelligence in marketing. If no, please explain. Answer: definitely yes. For example, viral marketing where a mechanism is designed to have customers promote the products to customers’ friends. Grading: if answer is yes, give at least 6 points. If no, but explanation is given, give 4 points. 6