MIS 2502 Section 03 Final Exam May 10th 2011, Speakman Hall 114 10:30 am – 11:30 am PLEASE DO NOT TURN THE PAGES UNTIL INSTRUCTED BY THE PROFESSOR. Please read the following instructions carefully before you start the exam. You will have 60 minutes to complete this exam. This exam has 6 pages (including the cover) and is worth 100 points. The amount of points for each question is indicated next to the question number. Budget your time accordingly. Space is provided to answer each of the questions. If you need more space, please use the back and number your answers. Be sure to write legibly – I cannot grade an exam that I can’t read. Good luck! TUID: Name: 1 I. Discussing shortly (no more than 30 words each) whether or not each of the following activities is a data mining task. No matter yes or no, name the task with your knowledge. (8 x 4 = 32 points total) 1. Computing the revenue of a company 2. Sorting a customer database based on customer names 3. Predicting the future stock price of a company using historical records 4. Clustering Temple students by their registered department names 5. Clustering Temple students into several groups by analyzing course GPA 6. Testing IQ for each student 7. Predicting the music preference of a user based on his or her favorite songs 8. Using ERD theory to establish a data warehouse for a company 2 II. Multiple Choices Questions. Each has only one answer. (5 x 4 = 20 points total) _____1. The software that we use for data mining practice is developed by: A. SPSS B. SAS C. Oracle D. Microsoft _____2. Microsoft usually clusters all potential customers into three groups: home users, professional users, and enterprise users. For each product, Microsoft will customize it into three different versions based on the needs of each user group. However Apple prefers providing just one universal version since they don’t cluster their customers in needs. From clustering view of data mining, which clustering strategy is dominantly better? A. Microsoft B. Apple C. Hard to say _____3. OLAP (On-Line Analytical Processing) are a set of multidimensional data analysis techniques. Which of following is not an OLAP technique? A. Slice B. Dice C. Cluster D. Pivot _____4. Which of following questions could NOT be answered with OLAP? A. what are the total sales for each product? B. how much salary did each employee receive for each month? C. which salesperson has sold the most? D. which product did each salesperson sells most? _____5. Which of following statements about regression is NOT correct? A. R square measures how much variance can be explained by a model B. if the p value of a factor is smaller than 0.1, this factor has significant influence on the target C. Predictors of a regression model can be correlated with each other D. Estimate coefficients are meaningless if the according p values are larger than 0.1 3 III. Data Analytics (48 points) A marketing manager is in charge of promoting their dungaree jeans which have four different styles: leisure, stretch, fashion, and original. 1. (10 points) The manager first did a clustering and segmentation analysis in order to understand the sales features of different stores and make different marketing strategies. In the output, he saw following information: Please use your own words to explain the feature of this segment. 2. (10 points) Understanding customer purchase behavior is essential, so the manager also analyzed the historical transaction records to identify the underlying association rules. Finally he found this information in output: Rule 1: Original Leisure (lift value = 6.0) Rule 2: Leisure Original (lift value = 5.2) What can you get from above output? 4 3. (20 points) The manager has planned and implemented several marketing plans. After a year, he decides to evaluate his marketing plans based on their operational data with regression. In this regression, the dependent variable and independent variables (predictors) are listed in table 1. Variable Name MarketShare DirectMail Internet PrintMedia TVRadio Table 1. Regression Variables 1 Unit Description 1% Weekly market share that this company is taking. $1.00 Weekly spend on Direct Mail advertisement $1.00 Weekly spend on Internet advertisement $1.00 Weekly spend on Print Media advertisement $1.00 Weekly spend on TV or Radio advertisement Role Target Input Input Input Input Regression Output: Model Fit Statistics R-Square 0.5267 Adj R-Sq AIC -586.7872 BIC SBC -574.4005 C(p) 0.5038 -584.1920 5.0000 Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error t Value P value Intercept 1 0.7547 0.0153 49.23 <.0001 DirectMail 1 -0.00011 0.000049 -2.25 0.0272 Internet 1 0.00025 0.000037 -6.98 <.0001 PrintMedia 1 -0.00049 0.000092 -5.34 <.0001 TVRadio 1 0.000026 0.000019 1.35 0.1820 Please answer following questions based on above output: a. How well is this regression model? b. Based on the output, which marketing strategy is the best? 5 c. How to explain the estimate coefficient of Internet? d. If you are the marketing manager, what actions will you take based on above regression result? 4. (8 points) In practice, tools using collective intelligence have performed better than theorists can explain, especially in IT industry. Can this manager use collective intelligence to improve his marketing performance? If yes, please design a detailed strategy to explain the power of collective intelligence in marketing. If no, please explain. 6