Uploaded by Gutu Ziyad

Advanced biostatistics assignment 2

advertisement
1
HARAMAYA UNIVERSITY
COLLEGEE OF HEALTH SCIENCE AND MEDICINE
SCHOOL OF POSTGRADUATE STUDIES
PROJECT WORK ON Relationship Between Number of Cigarettes
Smoked Per Day and determinants of lung cancer
A RESEARCH PROJECT SUBMITTED TO:- MR. ADISU( MPH,
BIOSTATISTICS)
PREPARED BY:
1. MOHAMMED ADUS
2. REGASSA DADI
3. DAWIT WENGELU
4. TEMESGEN KANSI
5. NEGASH ASEFA
6. DESALEGN ADUGNA
SEP/ 2022
HARAR, ETHIOPIA
2
Contents
INTRODUCTION ................................................................................................... Ошибка! Закладка не определена.
Methods and material ........................................................................................ Ошибка! Закладка не определена.
Present the descriptive statistics (graph, table, charts, measures of central tendency and dispersion) for important
variable ............................................................................................................... Ошибка! Закладка не определена.
Measure of central tendency and dispersion ................................................. Ошибка! Закладка не определена.
Identify the determinants of “number of cigars smoked each day” (i.e. the uncategorized data) using relevant risk
factors (use the appropriate statistical method, please!). Interpret all the relevant statistical outputs you get
including its model assumptions................................................................................................................................. 13
Interpretation for linear regression ........................................................................................................................ 14
Model Assumptions ................................................................................................................................................ 15
Interpretation for linear regression ............................................................................................................................ 18
Use appropriate model for identifying the factors associated with lung cancer ....................................................... 19
INTERPRETATION .................................................................................................................................................... 23
Determine the 95% confidence interval for coefficients of variables, check model adequacy and Interpret all the
relevant outputs from each model you fitted above (Question 5) ............................................................................ 23
Results ......................................................................................................................................................................... 24
Interpretation ......................................................................................................................................................... 24
Discussion ........................................................................................................... Ошибка! Закладка не определена.
CONCLUSION....................................................................................................... Ошибка! Закладка не определена.
REFERANCES........................................................................................................ Ошибка! Закладка не определена.
3
Introduction
The main research question for this paper is as follows, “Is there
relationship Between Number of Cigarettes Smoked Per Day and determinants of lung cancer
income, house hold size, sex, marital status, stick of smoking an
this research question is significant because the results of the data would determine the main
reasons that cause people in the r to smoke different numbers of cigarettes per day. The
dependent variable in the study is the number of cigarettes smoked. From the data provided,
the number of cigarettes is named as cigs and ranges from a low of zero to the highest value
of 80. The smoking of cigarette is an unhealthy behavior which is widespread all over the
world. It is the leading reason for premature death. Universally, around twenty percent of
grownup or matured individuals smoke ciggies, occasioning roughly a hundred million deaths
throughout the 20th C. Socioeconomic status(SES) has been put into consideration as the
most essential determinant of the behaviors of smoking. Based on the philosophy of diffusion
of innovation, the four phases of smoking have been described (Qing Wang, 2018). According
to the initial phase, innovators or the higher socioeconomic assemblages pervades smoking
and in the second phase, smoking spreads to the entire population (including the lower
socioeconomic assemblages). The third phase is categorized by the flinch of ending in higher
socioeconomic assemblages, the male dominance, and an increase in female smoking. Lastly,
in the fourth phase, smoking deteriorates amongst the higher socioeconomic assemblages but
resists high amongst the lower socioeconomic status. Therefore, the effects of SES on smoking
manners may vary across republics having dissimilar levels of socioeconomic development
4
Smoking cigarette extremely affects the health of individuals with low socioeconomic status.
Lower-salary cigarette smokers are affected most by illnesses triggered by smoking than those
smokers with higher earnings. They have a higher risk of lung cancer than those from the rich
assemblages. Also, those with very little high school education have higher incidences of lung
cancer than those who have undergone college education[CITATION Cig21 \l 1033 ].
Again, populations with lower income have less contact with health care services hence making
many individuals be diagnosed with various diseases as well as conditions related to smoking at
a later stage.
Methodology
This paper explores the impacts of of smoking, age, time of smoking, family size and
others.and income on the cigs which is the number of cigarettes an individual smoke in a day
Statistical analysis is done by using stata S/E version 15. This study was conducted with1250
study population by using two independent t-test for continuous outcomes variable (Average
number of cigarette stick consume per day) with categorical variable with two group (sex,
status and marital status).
Anova for more than two independent group (educational status) .
Linear regression for continuous outcome variable (Average number of cigarette stick
consume per day) and other independent variable such as month of smoking, age, time of
smoking, family size and others. Linear regression assumption was assessed: 1. Linearity- by
using two way scatter plot, 2. Normality was checked by kernel density 3. Homoscedasticity by
5
imtest and hettest and finally multi-collinearity by vif. We checked our model selection by
backward elimination and stepwise selection
And finally logistic regression for binary outcome (lung cancer) and independent
variables such as stick consumed, time of smoking, age, status income and others.
The regression model used is as follows
Cigs = + restaurn, + cigprice + income + white + age + age^3 + age ^4
The independent variables are restaurn which is expected to impact cigs positively. That
means that if the state doesn’t have smoking restrictions, it is expected that the number of
cigarettes smoked per day would increase. Cigprice which is the state cigarette price is expected
to affect the number of cigs positively because the lower the prices the more the number of
cigarettes people would smoke daily. Age is a factor expected to affect cigs positively. The
higher the age, the more the income and more number of cigs smoked. Income is expected to
impact cigs used positively where the higher the income of the people, the more the number of
cigs are smoked.
Description of the data
The data consisted of 11 variables namely month_smoking ,age_years ,marital_status,
sticks_consu, educ_status, time_smoking, sex. hh_size, status ,income_month lung_cancer
Educ represented Level of education, age was the age of the person in years, income
represented the months income of the individual in birr, cigs was the number of cigarettes
smoked per day,
Descriptive Statistics
6
Median for the length of time in month until the resumption of cigarette smoking was 33.
16 observation were blow 25 percentiles and 70 observation were above 95 percentiles for
the above table
7
Median for the length of time in month until the resumption of cigarette smoking was 33.
16 observation were blow 25 percentiles and 70 observation were above 95 percentiles for
the above table
8
Figure 4 Lung cancer over house hold size
9
Figure 6 Box plot for average number of cigarette sticks consumed each day during first
phase.
10
We use two independent sample t-test for gender, sticks censor and marital status
. ttest sticks_consu, by(sex)
Two-sample t test with equal variances
Group
Obs
Mean
Male
Female
604
646
11.83113
11.46904
combined
1,250
diff
Std. Err.
Std. Dev.
[95% Conf. Interval]
.413118
.412709
10.15296
10.48963
11.0198
10.65862
12.64245
12.27946
11.644
.2920572
10.32578
11.07102
12.21698
.3620856
.5845887
-.7847994
1.508971
diff = mean(Male) - mean(Female)
Ho: diff = 0
Ha: diff < 0
Pr(T < t) = 0.7321
t =
degrees of freedom =
Ha: diff != 0
Pr(|T| > |t|) = 0.5358
0.6194
1248
Ha: diff > 0
Pr(T > t) = 0.2679
 Assumptions of two independent sample t-test
 The variance of the dependent variable in the two populations are equal
 The dependent variable is normally distributed within each population
 The data are independent (scores of one participant are not related
systematically to the scores of the others)
 Hypothesis: Ho: μm = μf Vs HA: μm ≠ μf
We conclude that there is no significance mean cigarette smoked each day difference between
male and female. Because p value is greater than 0.05. That means we fail to reject null
hypothesis.
11
. ttest sticks_consu, by(marital_status)
Two-sample t test with equal variances
Group
Obs
Mean
Never ma
Married
639
611
combined
1,250
diff
Std. Err.
Std. Dev.
[95% Conf. Interval]
11.84664
11.43208
.4280062
.3959561
10.81933
9.787407
11.00616
10.65448
12.68711
12.20968
11.644
.2920572
10.32578
11.07102
12.21698
.4145568
.5843772
-.7319134
1.561027
diff = mean(Never ma) - mean(Married)
Ho: diff = 0
Ha: diff < 0
Pr(T < t) = 0.7609
t =
degrees of freedom =
Ha: diff != 0
Pr(|T| > |t|) = 0.4782
0.7094
1248
Ha: diff > 0
Pr(T > t) = 0.2391
Hypothesis: Ho: μm = μf Vs HA: μnm ≠ μf
We conclude that there is no significance mean cigarette smoked each day difference between
married and never married. Because p value is less than 0.05. That means we fail to reject null
hypothesis.
. ttest sticks_consu, by(status)
Two-sample t test with equal variances
Group
Obs
Mean
censored
Resumed
896
354
combined
1,250
diff
Std. Err.
Std. Dev.
[95% Conf. Interval]
13.31808
7.40678
.3635118
.3830718
10.88109
7.207453
12.60465
6.65339
14.03152
8.16017
11.644
.2920572
10.32578
11.07102
12.21698
5.911301
.626519
4.682154
7.140447
diff = mean(censored) - mean(Resumed)
Ho: diff = 0
Ha: diff < 0
Pr(T < t) = 1.0000
t =
degrees of freedom =
Ha: diff != 0
Pr(|T| > |t|) = 0.0000
9.4352
1248
Ha: diff > 0
Pr(T > t) = 0.0000
 Hypothesis: Ho: μc = μr Vs HA: μc ≠ μfr
We conclude that there is significance mean cigarette smoked each day difference between
censored and resumed. Because p value is less than 0.05. That means we reject null
hypothesis.
12
One way Anova for more than two population
. oneway sticks_consu educ_status
Source
Between groups
Within groups
Total
Analysis of Variance
SS
df
MS
F
2440.46591
130730.114
4
1245
610.116477
105.004108
133170.58
1249
106.621761
Bartlett's test for equal variances:
chi2(4) =
Prob > F
5.81
15.5655
0.0001
Prob>chi2 = 0.004
Assumptions for one way anova
The outcome is normally distributed.
Population variance is assumed constant among the groups.
Independent random samples among the groups.
Ho : µ1 = µ2 = : : : =µ k ,
HA : at least one of the means is different.
We reject the null hypothesis (p value < 0.05) and
We can conclude that at least one of the groups' means differ on cigarette smoked each day.
13
The average number of cigaretee sticks consumed each day during the first phase
by Level of education
(Bonferroni)
Row MeanCol Mean
Did not
High sch
Some col
College
High sch
-2.09707
0.124
Some col
-3.23212
0.003
-1.13504
1.000
College
-3.94306
0.000
-1.84599
0.229
-.710946
1.000
Post-und
-3.16846
0.149
-1.07139
1.000
.063655
1.000
.774601
1.000
Now the question is: which groups are different?
Answering this question requires multiple comparisons.
Bonferroni method corrects probability of Type I error for the number of tests.
All pairs of the below comparison are statistically significant at 0.05 level:
some college vs did not, college vs did not
.
Identify the determinants of “number of cigars smoked each day” (i.e. the uncategorized
data) using relevant risk factors (use the appropriate statistical method, please!). Interpret all
the relevant statistical outputs you get including its model assumptions.
We used linear regression model for our continuous variable (average number of cigarette
consumed each day) and other independent variable we used
14
Variable
selection
based
on
significance
in
multivariable
model:
. stepwise, pr(.01): regress sticks_consu month_smoking age_years marital_status educ_status time_smoking sex
> onth
begin with full model
p = 0.8380 >= 0.0100 removing income_month
p = 0.7364 >= 0.0100 removing marital_status
p = 0.5188 >= 0.0100 removing sex
p = 0.2628 >= 0.0100 removing educ_status
p = 0.1733 >= 0.0100 removing status
Source
SS
df
MS
Model
Residual
67575.7398
65594.8402
4
1,245
16893.9349
52.6866187
Total
133170.58
1,249
106.621761
sticks_consu
Coef.
month_smoking
age_years
time_smoking
hh_size
_cons
.1399341
.4584523
-.0829848
-.4173792
-10.54407
Std. Err.
.0114573
.022728
.0282136
.1481988
.9355041
t
12.21
20.17
-2.94
-2.82
-11.27
Number of obs
F(4, 1245)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.003
0.005
0.000
=
=
=
=
=
=
1,250
320.65
0.0000
0.5074
0.5059
7.2586
[95% Conf. Interval]
.1174563
.413863
-.1383363
-.7081262
-12.37941
.1624118
.5030417
-.0276334
-.1266321
-8.708731
Interpretation for linear regression
Dependent variable: Number of cigars smoked each day
 As month of smoking increase by one unit the number of cigarette smoking each day
was increased by 14% keeping others variable constant
 As age increase by one year the number of cigarette smoking each day was increased
by 45.8% keeping others variable constant
 As time of smoking increase by one unit the number of cigarette smoking each day
was reduced by 8.3% keeping others variable constant
 As family size increase by one person the number of cigarette smoking each day was
decreased by 41.7% keeping others variable constant
15
Model Assumptions
0
20
40
60
Linearity: - Relationship between independent and dependent variable is linear, so the
linearity
assumption
is
meet.
-20
0
Residuals
20
Fig 7 Relationship between residuals and the average number of cigarette sticks consumed
each day during fist phase
Normality is normally Distributed Error Terms. So the normality assumption is meet.
16
.4
.2
0
-2
-1
0
1
Pearson residual
2
3
Kernel density estimate
Normal density
kernel = epanechnikov, bandwidth = 0.2196
-2
0
2
4
fig 8 Normality Distribution of Error Terms.
-4
Density
.6
.8
Kernel density estimate
-4
-2
0
Inverse Normal
2
4
17
Fig 9 Distribution of standardized residuals and inverse normal
Homoscedasticity: - Variance of the error terms is constant. Is about homogeneity of
variance of the residuals.
-20
0
Residuals
20
Homoscedasticity assumption is not meet. The variance of the residuals is non-constant. It
is heteroscedastic.
0
10
20
Fitted values
30
40
Fig 10 Homoscedasticity of residuals and fitted values
Multi-collinearity: - When there is a perfect linear relationship among the predictors, the
estimates cannot be uniquely computed. We can use the vif command after the regression
to check for multi-collinearity. As a rule of thumb, a variable whose values are greater than
18
10 may need further investigation. In this case vif is less than 10 so, there is no multicollinearity.
. vif
Variable
VIF
1/VIF
time_smoking
age_years
month_smok~g
1.90
1.86
1.43
0.525649
0.537630
0.701662
Mean VIF
1.73
4 Interpret all the relevant outputs from each model you fitted above (Question 3)
Interpretation for linear regression
Dependent variable: Number of cigars smoked each day
• As month of smoking increase by one unit the number of cigarette smoking each day
was increased by 14% keeping others variable constant
• As age increase by one year the number of cigarette smoking each day was increased
by 45.8% keeping others variable constant
• As time of smoking increase by one unit the number of cigarette smoking each day
was reduced by 8.3% keeping others variable constant
 As family size increase by one person the number of cigarette smoking each day
was decreased by 41.7% keeping others variable constant
.
19
 As family size increase by one person the number of cigarette smoking each day
was decreased by 41.7% keeping others variable constant
Use appropriate model for identifying the factors associated with lung cancer
We firstly see whether there is an association between cigarette smoking
(exposure) and lung cancer (outcome) ignoring the other potential confounders.
Stepwise logistic regression using the likelihood ratio test
20
. logit lung_cancer sticks_consu, or
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-739.29478
-730.84004
-730.79539
-730.79539
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -730.79539
lung_cancer
Odds Ratio
sticks_consu
_cons
1.024798
.2862447
Std. Err.
.0060488
.0281062
z
4.15
-12.74
=
=
=
=
1,250
17.00
0.0000
0.0115
P>|z|
[95% Conf. Interval]
0.000
0.000
1.013011
.236134
1.036723
.3469895
. est store a
. logit lung_cancer time_smoking sticks_consu
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-739.29478
-702.50812
-702.02782
-702.02779
Logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -702.02779
lung_cancer
Coef.
time_smoking
sticks_consu
_cons
.0519246
.0020913
-1.595195
Std. Err.
.0069667
.006876
.1125756
z
7.45
0.30
-14.17
P>|z|
0.000
0.761
0.000
=
=
=
=
1,250
74.53
0.0000
0.0504
[95% Conf. Interval]
.0382701
-.0113855
-1.815839
.0655791
.0155681
-1.374551
. est store b
. lrtest b a
Likelihood-ratio test
(Assumption: a nested in b)
LR chi2(1) =
Prob > chi2 =
57.54
0.0000
21
. logit lung_cancer time_smoking sticks_consu age_years
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-739.29478
-700.20785
-699.66674
-699.66669
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -699.66669
lung_cancer
Coef.
time_smoking
sticks_consu
age_years
_cons
.0419965
-.0071338
.017522
-2.117346
Std. Err.
.0082204
.0079904
.0079996
.2667801
z
5.11
-0.89
2.19
-7.94
P>|z|
0.000
0.372
0.028
0.000
=
=
=
=
1,250
79.26
0.0000
0.0536
[95% Conf. Interval]
.0258849
-.0227948
.001843
-2.640225
.0581082
.0085271
.0332009
-1.594467
. est store c
. lrtest c b
Likelihood-ratio test
(Assumption: b nested in c)
LR chi2(1) =
Prob > chi2 =
4.72
0.0298
22
. logit lung_cancer time_smoking sticks_consu age_years status
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-739.29478
-698.61322
-697.99343
-697.99328
-697.99328
Logistic regression
Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2
Log likelihood = -697.99328
lung_cancer
Coef.
time_smoking
sticks_consu
age_years
status
_cons
.0398203
-.0090132
.0172915
-.2929866
-1.98381
Std. Err.
.0082956
.0080604
.0080068
.1617566
.2755864
z
4.80
-1.12
2.16
-1.81
-7.20
P>|z|
0.000
0.263
0.031
0.070
0.000
=
=
=
=
1,250
82.60
0.0000
0.0559
[95% Conf. Interval]
.0235612
-.0248113
.0015984
-.6100237
-2.523949
.0560795
.0067848
.0329847
.0240505
-1.44367
. est store d
. lrtest d c
Likelihood-ratio test
(Assumption: c nested in d)
LR chi2(1) =
Prob > chi2 =
3.35
0.0673
Status, income, house hold size, sex, marital status, stick of smoking and others are not
improve our model when we insert one by one.
Finally we get the below model
23
. logit lung_cancer age_years month_smoking time_smoking sticks_consu, or
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-739.29478
-696.7724
-696.12937
-696.12923
-696.12923
Logistic regression
Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2
Log likelihood = -696.12923
lung_cancer
Odds Ratio
age_years
month_smoking
time_smoking
sticks_consu
_cons
1.017153
1.010189
1.036079
.9854168
.1002543
Std. Err.
.0081701
.0038516
.0088738
.0083632
.0278405
z
2.12
2.66
4.14
-1.73
-8.28
=
=
=
=
1,250
86.33
0.0000
0.0584
P>|z|
[95% Conf. Interval]
0.034
0.008
0.000
0.083
0.000
1.001265
1.002669
1.018832
.9691608
.0581735
1.033292
1.017767
1.053618
1.001946
.1727748
INTERPRETATION
 As age increase by one year odds of lung cancer will increase in averagely by
1.7% keeping others variable constant
 As months of smoking increase by one unit odds of lung cancer increase in
averagely by 1% keeping others variable constant
 As time of smoking increase by one unit odds of lung cancer increase in
averagely by 3.6% keeping others variable constant
Determine the 95% confidence interval for coefficients of variables, check model adequacy
and Interpret all the relevant outputs from each model you fitted above (Question 5)
. estat gof
Logistic model for lung_cancer, goodness-of-fit test
number of observations
number of covariate patterns
Pearson chi2(991)
Prob > chi2
=
=
=
=
1250
1000
1017.56
0.2722
This model is well fitted because p-value is greater than 0.05
24
Results
This study was conducted on 1250 respondents, among which 604 male and 646 female.
From thi participants 639 were never married and 611were married, 896 were censored and
354were resumed to smoking.
Their educational status 258 (20.64%) did not complete high school, 356(28.48%) were
complete high school, 264(21.12%) some college, 290(23.2%) college degree and
82(6.56%) post under graduate degree.
Association of lung cancer with age of respondent by using logistic regression as follow.
. logit lung_cancer ib(2).cat1age_years, or
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-739.29478
-724.55172
-723.71048
-723.70486
-723.70486
Logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -723.70486
lung_cancer
Odds Ratio
cat1age_years
Adult
siner
4.347222
9.175781
1.632185
4.152295
_cons
.091954
.0339725
Interpretation:
Std. Err.
z
=
=
=
=
1,250
31.18
0.0000
0.0211
P>|z|
[95% Conf. Interval]
3.91
4.90
0.000
0.000
2.082688
3.779643
9.074015
22.2759
-6.46
0.000
.0445752
.1896917
25
The odd of lung cancer among seniors was 9.2 times higher than young
Among the adult and senior the incidence of cancer was higher in senior (>=65years) then
adult
The model specification and possible biasness would arise when the possible
independent variable is omitted from the model. This is the main reason why the
results first test all variables related to the dependent variable before eliminating
those that are not significant[ CITATION Gup20 \l 1033 ]. Looking at OLS
assumptions, the following is an analysis of the model to view if it satisfies the
OLS assumptions.
Ordinary Least Squares (OLS) method approximates the parameter in the
regression model. OLS parameters reduce the sum of squared errors (observed
values – predicted values).
First assumption is that the linear elements. The dependent variable is linear in
parameters. There exists random sampling as per the variables. A test for
heteroscedasticity is as shown in
Conclusion
We found, as expected, that lung cancer and smoking are positive
association to each other’s. Age in years, time of smoking, months of smoking and
sticks consumes are positive association with lung cancer both in adjusted and
crude. (Fig. 1). For example if we take time of smoking as time of smoking
increase by one year odds of lung cancer increase by 3%. It is similar for time of
26
smoking, as time of smoking increase by one unit odds of lung cancer increase in
averagely by 3.6% and also as age increase by one year odds of lung cancer will
increase in averagely by 1.7% at p value less than 0.05 as table below shows. This
also similar with research done in many different county of Africa We also
discovered that, according to our knowledge on epidemiology lung cancer has
positive association with number of stick consume rather than p value greater than
0.05. Except heteroscedasticity almost most of our linear assumption will meet The
question to be answered is what are the parameters that impact cigs? The project
looked at the factors that could be impacting cigs. Some of the factors established
from the regression model. The three variables impacted cigs significantly. These
results can be used by the government and healthcare to come up with ways of
moderating the number of cigarettes in the country. The same can be utilized by
cigarette company to produce and sell their cigarettes to people. Further research
should involve more variables because this only limited independent variables.
27
Bibliography
CDC. 2021. Cigarette Smoking and Tobacco Use AMong People of low Socioeconomic Status.
https://www.cdc.gov/tobacco/disparities/low-ses/index.htm.
Gupta. 2020. "Specification Bias." https://rlacollege.edu.in/pdf/Statistics/specification-bias.pdf.
Julian Perelma, Joana Alves, Timo‐Kolja Pfoertner, Irene Moor, Bruno Federico, Mirte A. G.
Kuipers, Matthias Richter, Arja Rimpela, Anton E. Kunst, and Vincent Lorant. 2017.
December. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5698771/.
Mullahy. 1997. "Instrumental-Variable Estimation of Count Data Models: Applications to
Models of Cigarette SMoking Behavior." Review of Economics and Statistics 79, 596593.
Download