Lab. Regression of Data Mining First Name: Last Name: Answer the questions and hand-in in hard copy by Thursday (April 21st) This is not a group project. You need to finish by yourself. Regression Practice 1 : Goal: understand the birth weight. 1, Create a new project named Birth Weight Analysis 2, Create a new diagram named Birth Weight Analysis 3, Specify the data source: SAS TABLE -> Browse -> Sashelp -> Bweight. Set weight as our target and keep all the other variables as our inputs. The variable descriptions are as following: Variable Description weight Infant’s Birth Weight black Indicator of Black Mother married Indicator of Married Mother boy Indicator of Boy visit Prenatal Visit: 0 - no visit, 1 - visit in second trimester, 2 - visit in last trimester, 3 - visit in first trimester ed Mother’s Education Level: 0 - high school, 1 - some college, 2 - college, 3 - less than high school smoke Indicator of Smoking Mother cigsper Number of Cigarettes Smoked per Day mom_age Mother’s Age m_wtgain Mother’s Weight Gain during Pregnancy 4. Simply drag the bweight data to the diagram. Then drag a regression node from the model tab. Connect two nodes and run. 5. In the results, find the R square value of this model: _10.65%___. 6. Based on the R square, is this a good prediction model for the baby weight? How to improve the model effectiveness? The model is significant to make better prediction than guess, however, 89.35% of variance couldn’t be explained by the model and it is not a good model for prediction purpose. To improve the model effectiveness, we can include more variables such as Dad’s information. 7. In the estimates part, find following outputs: Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error t Value Pr > |t| Intercept 1 3295.6 11.2256 293.58 <.0001 black 1 -196.1 7.0278 -27.91 <.0001 boy 1 109.8 4.7936 22.91 <.0001 cigsper 1 -4.2229 0.8971 -4.71 <.0001 ed 1 -1.3560 2.2062 -0.61 0.5388 m_wtgain 1 8.7199 0.1872 46.59 <.0001 married 1 76.2923 6.2098 12.29 <.0001 mom_age 1 5.9424 0.4518 13.15 <.0001 smoke 1 -168.8 12.4646 -13.54 <.0001 visit 1 6.4254 3.4559 1.86 0.0630 7.1 Is Mom’s education level associated with baby’s weight? If yes, how? No, because p value is higher than 0.1, thus education level is not significant in the influence. 7.2 Is Mon’s marriage status associated with baby’s weight? If yes, how? Yes, because p value for married coefficient is smaller than 0.1. The baby will weigh 76.2923 grams more if mother is married. 7.3 Any significant difference between boy and girl? Who weighs more? Yes, because p value of boy is smaller than 0.1. A boy will weigh an additional 109.8 grams than a girl. 7.4 Is it necessary to visit OB more often to control the baby’s weight? Yes, since the p value for visit is smaller than 0.1. 7.5 A bigger baby makes the Mom more likely to have difficult labor. If that is true, do you suggest a woman to give birth to the baby when she is younger or older? P value for Mom_age is smaller than 01. Thus for a mom with one year older in age, her baby will weigh an additional 5.94 grams. So an older mom is more likely to have a bigger baby and more likely to cause difficulty labor. 7.6 A bigger baby makes the Mom more likely to have difficult labor. If that is true, do you suggest a Mom to smoke or not? Although a smoking mother will have a bigger baby, smoking may lead to other medical problems. The result is not enough to make suggestion here. Regression Practice 2: Goal: examine marketing strategies. 1, Create a new project named Ads Investment Analysis 2, Create a new diagram named Ads Investment Analysis 3, Specify the data source: Metadata Repository -> Browse -> ABA1 -> saledata. Change the role of each variable as following: DirectMail, Internet, PrintMedia, and TVRadio are advertising investment variables. SalesAmount are the sales record under each advertisement strategy. The goal of this study is to understand the best investment of advertisement in order to reach the highest return on investment (ROI) Check the regression results and answer following questions: 1. How well is the model? R square is almost 90%. That means 90% variance of observations can be explained by the model. 2. What are the effective advertisement strategies? Direct Mail, Internet and Print Media are effective because their p values are smaller than 0.1 and estimate coefficients are positive. 3. Which advertisement strategy is the best? Internet. Because it’s significant (p value less than 0.1) and it has highest estimate coefficient. 4. Which advertisement strategy is effective but the worst? Direct mail. Although p value is significant, estimate coefficient is positive but the lowest 5. Which advertisement strategy should be abandoned? TV/radio should be abandoned since the estimate coefficient is negative which means it will decrease the sales. 6. If you are the marketing manager, how will you change the marketing plan? Place more investments on Internet until this strategy is saturated.