Regression-Lab-solutions

advertisement
Lab. Regression of Data Mining
First Name:
Last Name:
Answer the questions and hand-in in hard copy by Thursday (April 21st)
This is not a group project. You need to finish by yourself.
Regression Practice 1 :
Goal: understand the birth weight.
1, Create a new project named Birth Weight Analysis
2, Create a new diagram named Birth Weight Analysis
3, Specify the data source: SAS TABLE -> Browse -> Sashelp -> Bweight. Set weight as our target
and keep all the other variables as our inputs. The variable descriptions are as following:
Variable
Description
weight
Infant’s Birth Weight
black
Indicator of Black Mother
married
Indicator of Married Mother
boy
Indicator of Boy
visit
Prenatal Visit: 0 - no visit, 1 - visit in second trimester,
2 - visit in last trimester, 3 - visit in first trimester
ed
Mother’s Education Level: 0 - high school, 1 - some college,
2 - college, 3 - less than high school
smoke
Indicator of Smoking Mother
cigsper
Number of Cigarettes Smoked per Day
mom_age Mother’s Age
m_wtgain
Mother’s Weight Gain during Pregnancy
4. Simply drag the bweight data to the diagram. Then drag a regression node from the model
tab. Connect two nodes and run.
5. In the results, find the R square value of this model: _10.65%___.
6. Based on the R square, is this a good prediction model for the baby weight? How to improve
the model effectiveness?
The model is significant to make better prediction than guess, however, 89.35% of variance
couldn’t be explained by the model and it is not a good model for prediction purpose.
To improve the model effectiveness, we can include more variables such as Dad’s information.
7. In the estimates part, find following outputs:
Analysis of Maximum Likelihood Estimates
Standard
Parameter
DF
Estimate
Error
t Value
Pr > |t|
Intercept
1
3295.6
11.2256
293.58
<.0001
black
1
-196.1
7.0278
-27.91
<.0001
boy
1
109.8
4.7936
22.91
<.0001
cigsper
1
-4.2229
0.8971
-4.71
<.0001
ed
1
-1.3560
2.2062
-0.61
0.5388
m_wtgain
1
8.7199
0.1872
46.59
<.0001
married
1
76.2923
6.2098
12.29
<.0001
mom_age
1
5.9424
0.4518
13.15
<.0001
smoke
1
-168.8
12.4646
-13.54
<.0001
visit
1
6.4254
3.4559
1.86
0.0630
7.1 Is Mom’s education level associated with baby’s weight? If yes, how?
No, because p value is higher than 0.1, thus education level is not significant in the influence.
7.2 Is Mon’s marriage status associated with baby’s weight? If yes, how?
Yes, because p value for married coefficient is smaller than 0.1. The baby will weigh 76.2923
grams more if mother is married.
7.3 Any significant difference between boy and girl? Who weighs more?
Yes, because p value of boy is smaller than 0.1. A boy will weigh an additional 109.8 grams than
a girl.
7.4 Is it necessary to visit OB more often to control the baby’s weight?
Yes, since the p value for visit is smaller than 0.1.
7.5 A bigger baby makes the Mom more likely to have difficult labor. If that is true, do you
suggest a woman to give birth to the baby when she is younger or older?
P value for Mom_age is smaller than 01. Thus for a mom with one year older in age, her baby
will weigh an additional 5.94 grams. So an older mom is more likely to have a bigger baby and
more likely to cause difficulty labor.
7.6 A bigger baby makes the Mom more likely to have difficult labor. If that is true, do you
suggest a Mom to smoke or not?
Although a smoking mother will have a bigger baby, smoking may lead to other medical
problems. The result is not enough to make suggestion here.
Regression Practice 2:
Goal: examine marketing strategies.
1, Create a new project named Ads Investment Analysis
2, Create a new diagram named Ads Investment Analysis
3, Specify the data source: Metadata Repository -> Browse -> ABA1 -> saledata. Change the role
of each variable as following:
DirectMail, Internet, PrintMedia, and TVRadio are advertising investment variables.
SalesAmount are the sales record under each advertisement strategy.
The goal of this study is to understand the best investment of advertisement in order to reach
the highest return on investment (ROI)
Check the regression results and answer following questions:
1. How well is the model?
R square is almost 90%. That means 90% variance of observations can be explained by the
model.
2. What are the effective advertisement strategies?
Direct Mail, Internet and Print Media are effective because their p values are smaller than 0.1
and estimate coefficients are positive.
3. Which advertisement strategy is the best?
Internet. Because it’s significant (p value less than 0.1) and it has highest estimate coefficient.
4. Which advertisement strategy is effective but the worst?
Direct mail. Although p value is significant, estimate coefficient is positive but the lowest
5. Which advertisement strategy should be abandoned?
TV/radio should be abandoned since the estimate coefficient is negative which means it will
decrease the sales.
6. If you are the marketing manager, how will you change the marketing plan?
Place more investments on Internet until this strategy is saturated.
Download