Regression models for health policy by examples
Zoltán Vokó
Sándor Kabos
András Lőw
Created by XMLmind XSL-FO Converter .
Regression models for health policy by examples by Zoltán Vokó, Sándor Kabos, and András Lőw
Created by XMLmind XSL-FO Converter .
Table of Contents
iii
Created by XMLmind XSL-FO Converter .
List of Tables
iv
Created by XMLmind XSL-FO Converter .
List of Examples
v
Created by XMLmind XSL-FO Converter .
Chapter 1. Introduction
1. Occurrence relationships in epidemiology
The objective of epidemiological studies is to quantify occurrence relationships:
• how is the risk of a disease related to an exposure,
• how does the probability of the presence of the disease depend on signs, symptoms, findings,
• how is the outcome of a disease related to a treatment,
• how is the occurrence of a disease related to time, space, population, and characteristics of the populations,
Generalized linear models are the most widely used statistical methods to mathematically represent these relationships and to estimate their parameters.
The primary aim of this compilation of examples is to help students learning the appropriate way of fitting statistical models and interpreting their results.
The compilation of examples contains examples that are widely used in the epidemiological practice: analysis of aggregate data and cross-sectional studies (surveys).
It is recommended to proceed continuously in the material because its parts are built on each other.
2. Statistical background
Basic statistical knowledge is required (e.g. Faraway[bib_2]).
At the end of each chapter you will find a statistics summary with basic mathematical statistics information and formal drafting of applied models.
The statistical background of the presented regression analysis is the generalized linear model (detailed
needed in fitting statistical models and interpreting their results.
When interpreting regression models we examine the fitting of the model. As demonstrated in the example, if the model does not fit, the partial results are not accepted as authentic.
1
Created by XMLmind XSL-FO Converter .
AGE.05-09
AGE.10-14
AGE.15-19
AGE.25-29
AGE.30-34
AGE.35-39
AGE.40-44
AGE.45-49
AGE.50-54
AGE.55-59
AGE.60-64
AGE.65-69
AGE.70-74
Chapter 2. Poisson regression with categorical predictors
1. Data analysis examples
This chapter analises Hungary's mortality data of the year 2009. The research question is how the mortality depends on gender, age and the population number of the given habitation.
Input data:
• mortality (age 5) categorizing by gender, number of population
• resident population (same categorizing)
Example 2.1. Poisson regression according to age mortality, gender and number of population call poisson: Y ~ offset(LOGN) + AGE + GENDER + LSZKOD
Incidence density ratio
Coefficient Coefficient S.E.
z value Pr(>|z|)
(Intercept)
AGE.00-04
0,00052
2,33484
-7,5534
0,84794
0,05562
0,0694
-135,806
12,219
0
0
0,24172
0,31793
0,67058
1,10842
1,64896
2,86927
5,54978
11,49942
20,58072
27,22385
38,51154
52,60813
78,95571
-1,41996
-1,14592
-0,39961
0,10294
0,50014
1,05406
1,71376
2,4423
3,02435
3,30409
3,65096
3,96287
4,36889
0,14148
0,12354
0,08914
0,07531
0,06718
0,06324
0,06002
0,05794
0,05663
0,05631
0,05622
0,0561
0,05601
-10,03663
-9,2756
-4,48303
1,36684
7,44527
16,66724
28,55144
42,15066
53,40833
58,67517
64,93928
70,64271
78,00325
0
0
0
0
0
0
0
0
0
0
0
1,00E-05
0,17168
2
Created by XMLmind XSL-FO Converter .
Poisson regression with categorical predictors
AGE.75-79
LSZKOD.
1000–1999
Incidence density ratio
128,03337
AGE.80-84
AGE.85-X
218,1694
2004,5429
GENDER.F
0,4294
LSZKOD. –999 1,47814
1,48537
LSZKOD.
2000–4999
1,50867
Coefficient
4,85229
5,38527
7,60317
0,39567
0,41123
-0,84536
0,39079
Coefficient S.E.
0,05586
0,05582
0,05546
0,0042
0,00846
0,00833
0,00736 z value
86,86199
96,47972
137,08131
-201,3572
46,20179
47,49887
55,86308
Pr(>|z|)
0
0
0
0
0
0
0
LSZKOD.
5000–9999
1,43496
LSZKOD.
10000–19999
1,36816
LSZKOD.
20000–49999
1,33951
LSZKOD.
50000–99999
1,26126
0,36113
0,31346
0,2923
0,23211
0,00842
0,0081
0,00793
0,00968
42,88961
38,68758
36,86062
23,98679
0
0
0
0
LSZKOD. 100-
300 ezer
1,22477 0,20275 0,00828 24,49597 0
AGE ref.level: .20-24
GENDER ref.level: .MALE
LSZKOD ref.level= .BP
The reference category of data analysis were men, age 20-24 from Budapest. The model evaluates their mortality by Intercept=0,00052 (azaz 5,2 / 10 000) value. Other coefficients of incidence density ratio are calculated as relative to this one, e.g. the mortality in category of men of age 50-54 living in Budapest is estimated as 0,00052*20,58 =0,0107.
In a settlement of maximum 999 inhabitants, among women aged 50-54 this value is 0,00052*20,58
*1,47*0,4294=0,0067. It is important to know that these are not factual data but estimations and we get a different estimation from the same data using different models.
Goodness of fit signif = 0
(resid deviance = 3792,5 , resid df = 297 )
The above example shows that the the model does not fit. This does not mean that the evaluations of the model are all wrong.
3
Created by XMLmind XSL-FO Converter .
Poisson regression with categorical predictors
The main point is that you can not refer to the value of significance of a specific factor as valid unless the model fits well (i.e. the Goodness of fit signif > 0,05).
Example 2.2. Negative binomial regression, according to mortality, age group, gender, age*gender interaction and population call negbin: Y ~ offset(LOGN) + AGE + GENDER + AGE:GENDER + LSZKOD
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
(Intercept)
AGE.00-04
0,00063 -7,37688 0,06673 -110,5426 0
1,59647 0,46779 0,08893 5,26006 0
AGE.05-09
AGE.10-14
AGE.15-19
AGE.25-29
AGE.30-34
AGE.35-39
AGE.40-44
AGE.45-49
AGE.50-54
AGE.55-59
AGE.60-64
AGE.65-69
AGE.70-74
AGE.75-79
AGE.80-84
AGE.85-X
GENDER.F
LSZKOD. –999
LSZKOD. 1000–1999
0,18065 -1,71117 0,18443 -9,278 0
0,23681 -1,44049 0,16086 -8,95509 0
0,57782 -0,54849 0,10937 -5,01514 0
1,09534 0,09106 0,08949 1,01756 0,30889
1,5239 0,42127 0,08138 5,17668 0
2,53666 0,93085 0,07725 12,04994 0
5,056 1,62058 0,07324 22,12745 0
10,62039 2,36278 0,07078 33,38066 0
19,26348 2,95821 0,06922 42,73746 0
25,80242 3,25047 0,06884 47,21925 0
35,85012 3,57935 0,06877 52,0477 0
48,10098 3,8733 0,06867 56,40586 0
67,72765 4,21549 0,06866 61,39468 0
101,5715 4,62076 0,06853 67,42367 0
152,48681 5,02708 0,06863 73,24823 0
2013,381 7,60757 0,06782 112,17728 0
0,30983 -1,17172 0,13452 -8,71039 0
1,37744 0,32023 0,02026 15,80531 0
1,33491 0,28886 0,02005 14,40887 0
4
Created by XMLmind XSL-FO Converter .
LSZKOD. 2000–4999
LSZKOD. 5000–9999
LSZKOD. 10000–19999
LSZKOD. 20000–49999
LSZKOD. 50000–99999
LSZKOD. 100-300 ezer
AGE.00-04:GENDER.F
AGE.05-09:GENDER.F
AGE.10-14:GENDER.F
AGE.15-19:GENDER.F
AGE.25-29:GENDER.F
AGE.30-34:GENDER.F
AGE.35-39:GENDER.F
AGE.40-44:GENDER.F
AGE.45-49:GENDER.F
AGE.50-54:GENDER.F
AGE.55-59:GENDER.F
AGE.60-64:GENDER.F
AGE.65-69:GENDER.F
AGE.70-74:GENDER.F
AGE.75-79:GENDER.F
AGE.80-84:GENDER.F
AGE.85-X:GENDER.F
AGE ref.level: .20-24
GENDER ref.level: .MALE
5
Created by XMLmind XSL-FO Converter .
Poisson regression with categorical predictors
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
1,34887 0,29927 0,01924 15,55678 0
1,3035 0,26505 0,02012 13,17331 0
1,21119 0,1916 0,01983 9,66075 0
1,19949 0,18189 0,01974 9,21476 0
1,10914 0,10358 0,02113 4,90233 0
1,09874 0,09416 0,02003 4,70207 0
3,03915 1,11158 0,16041 6,92976 0
2,50165 0,91695 0,2969 3,08841 0,00201
2,53023 0,92831 0,26188 3,54486 0,00039
1,72242 0,54373 0,20142 2,6995 0,00694
1,0535 0,05212 0,18197 0,28643 0,77455
1,36086 0,30811 0,16059 1,91861 0,05503
1,57286 0,45289 0,15167 2,98613 0,00283
1,42547 0,3545 0,14578 2,43181 0,01502
1,37262 0,31672 0,14168 2,23544 0,02539
1,31095 0,27075 0,13912 1,94622 0,05163
1,26055 0,23155 0,1385 1,67188 0,09455
1,33435 0,28844 0,13825 2,08632 0,03695
1,40029 0,33668 0,13796 2,44034 0,01467
1,61923 0,48195 0,13774 3,49893 0,00047
1,88321 0,63298 0,13749 4,60385 0
2,27707 0,82289 0,13745 5,98678 0
1,18804 0,1723 0,13681 1,25946 0,20786
Poisson regression with categorical predictors
LSZKOD ref.level= .BP
Goodness of fit signif = 0,095042
(resid deviance = 311,45 , resid df = 280 )
Model 2.1.2. fits well at significance 0.05, because the interaction age*gender has been involved and besides we changed from Poisson model into negative binomial.
6
Created by XMLmind XSL-FO Converter .
Chapter 3. Regression models with numerical and categorical predictors
(summary)
In this chapter we examine the mortality data of Vas county with data according to mortality by settlement, age
(5) and gender
• population (same division)
• Environmental variables by settlement
Number of population (LSZKOD), distance of ambulance station to settlement (mento), rate of unemployed
(munkanelkarany), number of population with high-school qualification (kozepisk), number of population with higher education (felsofoku).
7
Created by XMLmind XSL-FO Converter .
Chapter 4. Fixed and random coefficients regression
(summary)
In this chapter we will examine the data of Európai Lakossági Egészségfelmérés – European Health Interview
Survey (ELEF 2009)
ELEF2009 was the first uniform European health interview survey that was carried out with the same methodology in all EU member states. The survey was carried out in fall, 2009. The Hungarian sample contained 449 settlements in two-stage sampling framework. From 7000 intended people 5051 answered the survey. The survey contained information regarding health status (illnesses, accidents, disabilities, work conditions, phisycal and emotional status); health behaviour (exercise, eating habits, smoking, alcohol consumption, drog abuse); health care use, health related expenses and socio-economic factors (gender, age, marital status, academic background, labor market status, income) .
8
Created by XMLmind XSL-FO Converter .
AGE.00-04
AGE.05-09
AGE.10-14
AGE.15-19
AGE.25-29
AGE.30-34
AGE.35-39
AGE.40-44
AGE.45-49
Chapter 5. Mapping of regressionbased estimates
In this chapter the base model is mixed Poisson regression, where the random factor is County (MEGYE). The model contains too many estimated parameters, so we used mapping for evaluation. During the anylysis we examine the effect of more random parameters, on the other hand while examining the effect of the region we reached a simple hierarchical model.
The descriptive epidemiological anylysis usually examine disease; place, time and population risc factors. The possibilities offered by geographic information systems are becoming more widely used to carry out such analysis. The analysis based on the result, the data collated by geographical location is not only suitable for display on a purely descriptive way, but also the analysis of relationships (eg, disease detection, monitoring).
1. Data analysis examples
We examine Hungary’s 2009 mortality data. The research question to be answered is the spatial distribution of mortality, the impact of gender and age
Input data:
• Total mortality by age (5-year age group), sex, place of residence, divided by county
• population (same division)
Example 5.1. Poisson regression, mixed model. Mortality gender and age group as fix explanatories, county as random explanatory fixed part
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
(Intercept) 0,00077 -7,16346 0,09247 -77,47157 0
1,58939 0,46335 0,12369 3,74607 2,00E-004
0,18062 -1,71137 0,26487 -6,46128 0
0,23744 -1,43784 0,23034 -6,24229 0
0,57746 -0,54911 0,15425 -3,55986 4,00E-004
1,08709 0,0835 0,12455 0,67044 0,50281
1,5106 0,41251 0,11223 3,67569 0,00026
2,53017 0,92829 0,10592 8,76403 0
5,06703 1,62275 0,09976 16,26586 0
10,66057 2,36655 0,096 24,65157 0
9
Created by XMLmind XSL-FO Converter .
AGE.50-54
AGE.55-59
AGE.60-64
AGE.65-69
AGE.70-74
AGE.75-79
AGE.80-84
AGE.85-X
GENDER.F
AGE.00-04:GENDER.F
AGE.05-09:GENDER.F
AGE.10-14:GENDER.F
AGE.15-19:GENDER.F
AGE.25-29:GENDER.F
AGE.30-34:GENDER.F
AGE.35-39:GENDER.F
AGE.40-44:GENDER.F
AGE.45-49:GENDER.F
AGE.50-54:GENDER.F
AGE.55-59:GENDER.F
AGE.60-64:GENDER.F
AGE.65-69:GENDER.F
AGE.70-74:GENDER.F
AGE.75-79:GENDER.F
Mapping of regression-based estimates
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
19,34415 2,96239 0,09361 31,64652 0
25,81499 3,25096 0,09304 34,94296 0
35,7357 3,57615 0,09294 38,47896 0
47,93188 3,86978 0,09278 41,70918 0
67,73072 4,21554 0,09277 45,44005 0
100,97297 4,61485 0,09258 49,84919 0
151,8468 5,02287 0,09271 54,17843 0
1206,6027
2
7,09556 0,09161 77,45365 0
0,30731 -1,17991 0,19168 -6,15567 0
3,06532 1,12015 0,2269 4,93676 0
2,52204 0,92507 0,42734 2,16471 0,03076
2,55128 0,9366 0,37617 2,48981 0,01302
1,72751 0,54668 0,28749 1,90153 0,05766
1,05763 0,05603 0,25886 0,21645 0,8287
1,36753 0,313 0,22718 1,3778 0,16873
1,58168 0,45849 0,21389 2,1436 0,03243
1,43111 0,35845 0,20512 1,74753 0,08101
1,37626 0,31937 0,19903 1,60459 0,10906
1,3174 0,27566 0,19522 1,41202 0,15841
1,26534 0,23534 0,19431 1,21117 0,22626
1,34924 0,29954 0,19394 1,54449 0,12294
1,41711 0,34862 0,19352 1,8015 0,07208
1,63193 0,48976 0,1932 2,53495 0,01147
1,91226 0,64829 0,19283 3,36194 0,00082
10
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
2,30186 0,83372 0,19277 4,32501 2,00E-005
1,97642 0,68129 0,1919 3,55023 0,00041
AGE.80-84:GENDER.F
AGE.85-X:GENDER.F
AGE ref.level: .20-24
GENDER ref.level: .MALE
random part
.Budapest
.Pest
.Fejér
.Komárom-Esztergom
.Veszprém
.Győr-Sopron
.Vas
.Zala
.Baranya
.Somogy
.Tolna
.Borsod-Abaúj-Zempl
.Heves
.Nógrád
.Hajdú-Bihar
.Jász-Nkun-Szolnok
.Szabolcs-Szatmár
.Bács-Kiskun
.Békés
-0,04176
-0,05531
0,00573
0,06831
-0,02113
0,07193
0,02375
0,06502
Estimate
-0,17654
-0,02374
0,02456
0,09173
-0,00481
-0,06121
-0,00681
0,03247
0,06943
-0,02217
-0,00118
11
Created by XMLmind XSL-FO Converter .
0,02317
0,02227
0,02094
0,02145
0,02365
0,01847
0,02161
0,02401
Std.Error
0,01643
0,0175
0,02088
0,02218
0,0215
0,02068
0,01989
0,02047
0,01977
0,01957
0,02037
Mapping of regression-based estimates
.Csongrád
Estimate
-0,03826
AGE.10-14
AGE.15-19
AGE.25-29
AGE.30-34
AGE.35-39
AGE.40-44
AGE.45-49
AGE.50-54
AGE.55-59
AGE.60-64
AGE.65-69
AGE.70-74
AGE.75-79
AGE.80-84
AGE.85-X
Example 5.2. Hierarchical Poisson regression. Mortality, gender and age group and region as fix explanatories, county as random explanatory fixed part variable Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
(Intercept)
AGE.00-04
AGE.05-09
0,00068 -7,28734 0,34706 -20,99735 0
1,59218 0,4651 0,46704 0,99585 0,31968
0,18117 -1,70832 1,00102 -1,70815 0,08807
GENDER.F
REGIO.Közép-Dunántúl
0,23832 -1,43414 0,86976 -1,6489 0,09963
0,57866 -0,54704 0,58246 -0,93919 0,34797
1,08497 0,08155 0,47031 0,17339 0,86239
1,50873 0,41127 0,42376 0,97052 0,33213
2,5317 0,92889 0,39995 2,3225 0,0205
5,07492 1,62431 0,37671 4,31182 2,00E-005
10,67228 2,36765 0,3625 6,53145 0
19,35025 2,96271 0,35347 8,38176 0
25,80901 3,25072 0,35131 9,25321 0
35,71514 3,57557 0,35094 10,18867 0
47,8534 3,86814 0,35034 11,04109 0
67,57056 4,21317 0,35031 12,02704 0
100,57403 4,61089 0,34957 13,19015 0
150,97558 5,01712 0,35007 14,33156 0
1197,9428
5
7,08836 0,34592 20,49122 0
0,30695 -1,18107 0,72379 -1,6318 0,10319
1,17536 0,16157 0,06134 2,63385 0,00864
12
Created by XMLmind XSL-FO Converter .
Std.Error
0,0205
Mapping of regression-based estimates variable
REGIO.Nyugat-Dunántúl
REGIO.Dél-Dunántúl
REGIO.Észak-Magyarország
REGIO.Észak-Alföld
REGIO.Dél-Alföld
AGE.00-04:GENDER.F
AGE.05-09:GENDER.F
AGE.10-14:GENDER.F
AGE.15-19:GENDER.F
AGE.25-29:GENDER.F
AGE.30-34:GENDER.F
AGE.35-39:GENDER.F
AGE.40-44:GENDER.F
AGE.45-49:GENDER.F
AGE.50-54:GENDER.F
AGE.55-59:GENDER.F
AGE.60-64:GENDER.F
AGE.65-69:GENDER.F
AGE.70-74:GENDER.F
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
1,06974
1,15893
1,20472
1,17185
1,72913
1,058
1,36844 0,31367 0,85781 0,36566 0,71473
1,58283 0,45921 0,80763 0,56859 0,56982
1,43114 0,35847 0,77453 0,46282 0,64364
1,37653 0,31957 0,75156 0,4252 0,67082
1,31752 0,27575 0,73717 0,37406 0,70847
1,26518 0,23522 0,73371 0,32059 0,74862
1,34863
1,41745
1,63348
0,06742
0,1475
0,18624
0,15858
0,54762
0,05638
0,29909
0,34886
0,49071
0,0623
0,0622
0,06006
0,05832
1,08559
0,97746
0,73233
0,73073
0,72954
1,08215
2,37124
3,10075
2,71909
0,50444
0,05768
0,40841
0,47741
0,67263
0,27957
0,01801
0,00201
0,00671
1,10954 0,10394 0,05851 1,77663 0,07608
3,06913 1,12139 0,85677 1,30887 0,19103
2,5252 0,92632 1,6136 0,57407 0,56611
2,5542 0,93774 1,42042 0,66018 0,50936
0,61412
0,95402
0,6831
0,63322
0,50141
AGE.75-79:GENDER.F
AGE.80-84:GENDER.F
1,91529 0,64987 0,72814 0,89251 0,37244
2,30636 0,83567 0,72789 1,14807 0,25135
AGE.85-X:GENDER.F
AGE ref.level: .20-24
GENDER ref.level: .MALE
REGIO ref.level:. Közép-Magyarország random part
1,98076 0,68348 0,72462
13
Created by XMLmind XSL-FO Converter .
0,94322 0,34591
Mapping of regression-based estimates
.Budapest
.Pest
.Fejér
.Komárom-Esztergom
.Veszprém
.Győr-Sopron
.Vas
.Zala
.Baranya
.Somogy
.Tolna
.Borsod-Abaúj-Zemplén
.Heves
.Nógrád
.Hajdú-Bihar
.Jász-Nkun-Szolnok
Estimate
-0,05174
0,05174
-0,00455
0,0197
-0,01515
-0,00279
0,00345
-0,00066
-0,00586
0,01813
-0,01228
0,00813
-0,01157
0,00344
-0,01753
3,00E-004
.Szabolcs-Szatmár
.Bács-Kiskun
0,01724
-0,00074
0,03984
0,03979
.Békés region
Közép-Magyarország
0,00822
Budapest
0,0402
.Csongrád -0,00748
Variable county is hierarchically imbedded into variable region as follows:
0,04027 county
Közép-Dunántúl
Pest
Fejér
Std.Error
0,03849
0,03849
0,04107
0,04156
0,0413
0,04121
0,04208
0,04175
0,04127
0,04144
0,04224
0,03996
0,04112
0,04216
0,0399
0,04021
14
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates region
Észak- Dunántúl
Dél- Dunántúl
Észak-Magyarország
Észak-Alföld
Dél-Alföld county
Komárom-Esztergom
Veszprém
Győr-Sopron
Vas
Zala
Baranya
Somogy
Tolna
Borsod-Abaúj-Zemplén
Heves
Nógrád
Hajdú-Bihar
Jász-Nagykun-Szolnok
Szabolcs-Szatmár
Bács-Kiskun
Békés
Csongrád
Example 5.3. Poisson regression. Mortality, gender, age group and region as fix explanatories, county*gender*age (2 groups) as random explanatories fixed part variable Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
(Intercept)
AGE.00-04
AGE.05-09
0,00078 -7,15859 0,08301 -86,2351 0
1,59329 0,4658 0,10937 4,25883 2,00E-005
0,18078 -1,71048 0,2342 -7,30341 0
AGE.10-14 0,23722 -1,43875 0,20368 -7,06396 0
15
Created by XMLmind XSL-FO Converter .
variable
AGE.15-19
AGE.25-29
AGE.30-34
AGE.35-39
AGE.40-44
AGE.45-49
AGE.50-54
AGE.55-59
AGE.60-64
AGE.65-69
AGE.70-74
AGE.75-79
AGE.80-84
AGE.85-X
GENDER.F
AGE.00-04:GENDER.F
AGE.05-09:GENDER.F
AGE.10-14:GENDER.F
AGE.15-19:GENDER.F
AGE.25-29:GENDER.F
AGE.30-34:GENDER.F
AGE.35-39:GENDER.F
AGE.40-44:GENDER.F
AGE.45-49:GENDER.F
Mapping of regression-based estimates
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
0,57697 -0,54997 0,13639 -4,0322 6,00E-005
1,0893 0,08553 0,11013 0,77661 0,43769
1,51713 0,41682 0,09924 4,20014 3,00E-005
2,53994 0,93214 0,09366 9,95211 0
5,07895 1,6251 0,08822 18,42167 0
10,66252 2,36674 0,08489 27,88101 0
19,33157 2,96174 0,08277 35,78133 0
25,84247 3,25202 0,08227 39,52974 0
35,78517 3,57753 0,08218 43,53171 0
47,23767 3,85519 0,08638 44,62836 0
66,82966 4,20215 0,08637 48,65094 0
99,48389 2011,04,06 0,08621 53,35728 0
149,20587 5,00533 0,08633 57,98004 0
1182,7626
6
7,07561 0,08541 82,84178 0
0,30096 -1,20079 0,17171 -6,99319 0
3,05752 1,11761 0,20064 5,57023 0
2,52642 0,9268 0,37788 2,45267 0,01445
2,56568 0,94222 0,33263 2,83266 0,00477
1,73553 0,55132 0,25422 2,16869 0,03049
1,05155 0,05026 0,2289 0,21959 0,82626
1,35516 0,30392 0,20089 1,51284 0,13083
1,57186 0,45226 0,18913 2,39121 0,01709
1,42679 0,35542 0,18138 1,9596 0,05049
1,37772 0,32043 0,176 1,82066 0,06914
16
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
.Pest
.Fejér
.Fejér
.Fejér
.Fejér
.Budapest
.Pest
.Pest
.Pest
variable
AGE.50-54:GENDER.F
AGE.55-59:GENDER.F
AGE.60-64:GENDER.F
AGE.65-69:GENDER.F
Incidence density ratio
Coefficient Coefficien t S.E.
z value Pr(>|z|)
1,32005 0,27767 0,17263 1,60852 0,10823
1,26126 0,23211 0,17182 1,3509 0,17722
1,34192 0,2941 0,1715 1,71491 0,08686
1,47544 0,38895 0,17538 2,21773 0,02694
AGE.70-74:GENDER.F
AGE.75-79:GENDER.F
1,69597 0,52825 0,17511 3,01672 0,00266
1,99159 0,68893 0,17479 3,9415 9,00E-005
AGE.80-84:GENDER.F
AGE.85-X:GENDER.F
2,4072 0,87846 0,17474 5,02732 0
2,0762
AGE ref.level: .20-24
GENDER ref.level: .MALE
random part age (2 groups): AGE 64 and younger, 65 and older
0,73054 0,174 4,19859 3,00E-005
MEGYE
.Budapest
.Budapest
.Budapest
GENDER
.M
.M
.F
AGGE
.-64
.65+
.-64
Coefficient
-0,213925
-0,094141
-0,056986 exp()
0,810314
0,910665
0,946663
.F
.M
.M
.F
.F
.M
.M
.F
.F
.65+
.-64
.65+
.-64
.65+
.-64
.65+
.-64
.65+
-0,247406
-0,089623
-0,021785
-0,021977
-0,008711
0,000008
0,024822
-0,01569
0,033056
0,78316
0,917565
0,979
0,980392
0,994294
1,003677
1,025709
0,986574
1,036701
17
Created by XMLmind XSL-FO Converter .
.Vas
.Vas
.Vas
.Zala
.Zala
.Zala
.Zala
.Baranya
MEGYE
.Komárom-
Esztergom
.Komárom-
Esztergom
.Komárom-
Esztergom
.Komárom-
Esztergom
.Veszprém
.Veszprém
.Veszprém
.Veszprém
.Győr-Sopron
.Győr-Sopron
.Győr-Sopron
.Győr-Sopron
.Vas
.Baranya
.Baranya
.Baranya
.F
.M
.F
.F
.M
.F
.F
.M
.M
.M
.F
.F
.M
.M
.M
.F
.F
.M
.F
.F
.M
GENDER
.M
.M
.F
Mapping of regression-based estimates
AGGE
.-64
Coefficient
0,077849
.65+
.-64
.65+
.-64
.65+
.-64
.65+
.-64
.65+
.-64
.-64
.65+
.-64
.65+
.-64
.65+
.65+
.-64
.65+
.-64
.65+
.-64
.65+
0,042885
0,027264
0,127112
-0,060216
-0,024939
-0,003444
0,025449
-0,093541
-0,018875
-0,116642
-0,074526
-0,038448
-0,010211
-0,050228
-0,063834
-0,092797
-0,006371
-0,106337
-0,070847
-0,02283
0,000535
0,021527
0,010782
18
Created by XMLmind XSL-FO Converter .
exp()
1,084848
1,044404
1,029875
1,138942
0,89184
0,930961
0,965744
0,990397
0,953082
0,940968
0,914658
0,944949
0,975917
0,99873
1,028845
0,913977
0,981853
0,994207
0,901078
0,934392
0,980945
1,001097
1,023984
1,013866
.M
.F
.M
.F
.F
.M
GENDER
.M
.M
.F
.F
.M
.F
.M
.F
.F
.M
.F
.F
.M
.M
.M
.F
.F
.M
.Borsod-Abaúj-
Zemplén
.Borsod-Abaúj-
Zemplén
.Borsod-Abaúj-
Zemplén
.Heves
.Heves
.Heves
.Heves
.Nógrád
.Nógrád
.Nógrád
.Nógrád
.Hajdú-Bihar
.Hajdú-Bihar
.Hajdú-Bihar
.Hajdú-Bihar
MEGYE
.Somogy
.Somogy
.Somogy
.Somogy
.Tolna
.Tolna
.Tolna
.Tolna
.Borsod-Abaúj-
Zemplén
.65+
.-64
.65+
.65+
.-64
.65+
.-64
.65+
.-64
.65+
.-64
.65+
.-64
.65+
.-64
Mapping of regression-based estimates
.65+
.-64
.65+
.-64
AGGE
.-64
.65+
.-64
.65+
.-64
Coefficient
0,05887
0,030198
0,069468
0,085356
-0,014433
-0,020551
-0,066871
-0,015509
0,171443
0,03615
0,145759
0,047318
0,067652
0,034561
-0,00225
0,00033
0,037194
0,032501
0,033666
0,087144
-0,053384
0,005554
-0,050188
0,002241
19
Created by XMLmind XSL-FO Converter .
exp()
1,064453
1,031237
1,07427
1,092364
0,989217
0,980209
0,937351
0,987557
1,191287
1,037393
1,159435
1,051593
1,073842
1,035746
0,999924
1,003323
1,041628
1,033615
1,036489
1,094319
0,951426
1,006134
0,95312
1,005243
Mapping of regression-based estimates
MEGYE GENDER
.Jász-Nkun-Szolnok .M
.Jász-Nkun-Szolnok .M
.Jász-Nkun-Szolnok .F
.Jász-Nkun-Szolnok .F
.Szabolcs-Szatmár .M
AGGE
.-64
.65+
.-64
.65+
.-64
Coefficient
0,078645
0,002624
0,095987
0,025507
0,072532 exp()
1,085712
1,003191
1,10314
1,028905
1,079096
.Szabolcs-Szatmár .M
.Szabolcs-Szatmár .F
.Szabolcs-Szatmár .F
.Bács-Kiskun .M
.Bács-Kiskun
.Bács-Kiskun
.Bács-Kiskun
.Békés
.M
.F
.F
.M
.Békés
.Békés
.Békés
.Csongrád
.M
.F
.F
.M
.65+
.-64
.65+
.-64
.65+
.-64
.65+
.-64
0,048643
-0,009299
0,085568
0,041893
-0,03363
0,00419
-0,039088
0,060994
1,050435
0,9929
1,092596
1,046535
0,967472
1,006384
0,964544
1,066716
.65+
.-64
.65+
.-64
-0,00322
0,094299
-0,035073
-0,059777
0,997345
1,10128
0,968424
0,945363
.Csongrád
.Csongrád
.M
.F
.65+
.-64
-0,035977
-0,03572
0,965204
0,96701
.Csongrád .F
.65+ -0,034631 0,968852
The comparison of model 5.1.1. and 5.1.2 estimates are not as simple as in chapter 4.
Both because the data structure is slightly more complicated: the counties of residence by regions and counties are classified. In example 5.1.2 a more sophisticated model is the random factor, for the statistical summary, see
5.2. subsection. This is a very simple example of a hierarchical (also called "multilevel) analysis. A hierarchical data structure of the error flag is commonly hierarchical (nested) nature of the activity, in fact the stochastic nature of the model: random error is assumed and the output variable, as well as the output variable is determining the distribution of parameter(s). This far-reaching generalization of the random factor model is a hot area of research of modern statistics based on estimates of hierarchical simulation models for applications to fit the needs of a more flexible development of statistical procedures.
2. Examples of analysis of geographic data
20
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
In this chapter we examine the association between the model parameter estimates we got in 5.1. exercise and raw mortality data.
We draw maps of counties with the model parameter estimates, and we examine how much these maps resemble the age and gender specific mortality maps.
Figure 5.3.1. Mortality rate by county (/100 000 persons)
Figure 5.3.2. Age distribution of the counties (population of each county is taken as 100%)
The above maps are based on the crude mortality data published by the CSO. The three maps look different, but they share the feature that they are all different from the county specific estimates obtained in model 5.1.1.
21
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
The crude mortality rates for the total population and for women are in the same category for Szabolcs-Szatmár and Győr-Sopron counties, crude mortality rate of men is better in Szabolcs-Szatmár county. The estimate of
Győr-Sopron county on the other hand is much better in the Poission model than that of Szabolcs-Szatmár county. The reason is obviously the different age distribution we could see above. The following maps show the proportion of elderly by county.
Figure 5.3.3. Proportion of persons at age 45 years or more by county
Figure 5.3.4. Proportion of persons at age 65 years or more by county
22
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.5. Proportion of persons at age 85 years or more by county
One can see that the proportion of persons at age 45+ years is in the same category for Szabolcs-Szatmár and
Győr-Sopron counties, but there is a difference in the proportion of persons at age 65+ years. The population of
Győr-Sopron county is older, and Poisson regression takes this into account.
The incidence rate ratio is corrected for the different age distribution in the Poisson model.
Figure 5.3.6. Proportion of men at age 45 years or more by county
23
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.7. Proportion of men at age 65 years or more by county
Figure 5.3.8. Proportion of men at age 85 years or more by county
24
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.9. Proportion of women at age 45 years or more by county
Figure 5.3.10. Proportion of women at age 65 years or more by county
25
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.11. Proportion of women at age 85 years or more by county
In the comparison of Szabolcs-Szatmár and Győr-Sopron one can see that among men especially the proportion of men at age 85+ are different, and since this is the age-group in which the mortality rate is very high, this phenomenon explains the higher crude mortality rate of men in Győr-Sopron county.
It can also been seen that the proportion of elderly is very high in Budapest and very low in Pest county.
26
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.12. Age specific mortality estimated in the Poisson model for Bpest and Bács-Kiskun counties
Figure 5.3.13. Age specific mortality estimated in the Poisson model for Győr-Sopron and Szabolcs-Szatmár counties
27
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.14. Age specific mortality estimated in the Poisson model for Vas and Nógrád counties
Table 5.1. Table 5.3.1. Mortality of men (/100 000 persons) in 2009 (Source: KSH)
Budapest
Pest
Fejér
Komárom-
Esztergom
Veszprém
Győr-Moson-
Sopron
Vas
Zala
Baranya
AGE:.45-54
934
1098
1222
1267
1167
1064
1083
1087
1150
AGE:.55-64
1925
2152
2214
2639
2164
2153
2392
2021
2236
AGE:.65-74 AGE:.75-84
3378 7525
4126
4696
8989
9666
4921
4213
4147
4151
3997
4383
9944
8719
9140
9486
9921
9442
AGE:.85-X
16450
17839
17484
20074
17339
20647
17535
18634
17598
28
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Somogy
AGE:.45-54
1366
Tolna 1114
Borsod-Abaúj-
Zemplén
1435
Heves
Nógrád
1319
1211
Hajdú-Bihar 1122
Jász-Nagykun-
Szolnok
1325
Szabolcs-
Szatmár-Bereg
1242
Bács-Kiskun 1216
Békés 1275
Csongrád 1069
2574
2564
2226
2441
AGE:.55-64
2377
2345
2716
2552
2372
2593
2231
4664
4555
4434
4532
AGE:.65-74 AGE:.75-84
4491 10439
4314
4979
8714
9653
5061
4158
4650
3913
10225
10314
9620
9340
10103
8568
9022
8897
AGE:.85-X
20564
15956
18802
20748
20331
17600
17027
18950
19168
16857
19160
Table 5.2. Table 5.3.2. Mortality of women (/100 000 persons) in 2009 (Source: KSH)
AGE:.45-54 AGE:.55-64 AGE:.65-74 AGE:.75-84 AGE:.85-X
Budapest
Pest
Fejér
Komárom-
Esztergom
Veszprém
Győr-Moson-
Sopron
Vas
Zala
492
472
498
525
462
390
392
377
897
918
888
947
912
775
863
773
1867
1959
2055
2384
1975
1814
1831
2038
5343
6119
5904
6775
6241
5879
5805
5867
14547
15776
16208
15859
15226
15638
17246
16196
Baranya
Somogy
545
555
946
1036
1989
2283
6036
6484
16106
17326
29
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Tolna
AGE:.45-54
366
AGE:.55-64
873
AGE:.65-74 AGE:.75-84
2070 6207
AGE:.85-X
14967
Borsod-Abaúj-
Zemplén
526
Heves
Nógrád
456
486
1090
919
966
814
2314
2207
2237
1838
6256
6327
6785
6123
14999
15416
17176
16027 Hajdú-Bihar 465
Jász-Nagykun-
Szolnok
588
Szabolcs-
Szatmár-Bereg
455
1031
931
2135
2224
6033
6459
16860
16716
Bács-Kiskun 465
Békés 516
Csongrád 480
908
1033
1925
2152
5831
5917
16106
15253
870 1987 6104 16064
The Poission model (as any other parametric model) expresses the associations in its own structure. We selected some counties and plotted the estimated age-specific mortalities.
These figures show the mechanism, how the different age distribution (or more generally, the difference in the distributions of any explanatory variable) are taken into account by the Poisson model.
30
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.15. Mortality rate (/100 000 persons) at age 45 years or more.
Figure 5.3.16. Mortality rate (/100 000 persons) at age 65 years or more.
Figure 5.3.17. Mortality rate (/100 000 persons) at age 85 years or more.
31
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.18. Mortality rate (/100 000 men) by county
Figure 5.3.19. Mortality rate (/100 000 men) at age 45 years or more.
32
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.20. Mortality rate (/100 000 men) at age 65 years or more.
Figure 5.3.21. Mortality rate (/100 000 men) at age 85 years or more.
33
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.22. Mortality rate (/100 000 women) by county
Figure 5.3.23. Mortality rate (/100 000 women) at age 45 years or more.
34
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
Figure 5.3.24. Mortality rate (/100 000 women) at age 65 years or more.
Figure 5.3.25. Mortality rate (/100 000 women) at age 85 years or more.
The age and gender specific mortality rates bring us closer to the understanding the county specific incidence rate ratios estimated by the 5.1.1. Poisson model. probably the most surprising result is the far the best value of
Budapest, because of the very favorable mortality of 85+ woman.
35
Created by XMLmind XSL-FO Converter .
Mapping of regression-based estimates
It is important to understand that this phenomenon partially can also be seen on the crude mortality maps, but is it reflected by the Poisson model if only this can also be seen .
Figure. 5.3.26. IDR components (/1000) as calculated by model 5.1.1.
Figure 5.3.27. IDR components (/1000) as calculated by model 5.1.2.
REGIO (region) is a fixed variable of the hierarchical Poisson model and MEGYE (county) is an imbedded random factor. The figures demonstrate the “shrinkage” effect of model 5.1.2. compared with model 5.1.1. because some of counties went closer to each others within a region.
In brief: the mixed model 5.1.1. can be generalized in two ways: first as a hierarchical one in model 5.1.2. the involving multiple random factors in model 5.1.3. The goodness-of-fit tests can be performed by bootstrap
(Gelman, R package mi).
36
Created by XMLmind XSL-FO Converter .
References
[bib_1] Categorical Data Analysis.
. Agresti A.. Copyright © 1990. Wiley.
[bib_2] Faraway J.J.. 2002. Practical Regression and Anova using R.
. http://cran.rproject.org/doc/contrib/Faraway-PRA.pdf.
[bib_3] Gelman A. and Hill J.. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models .
Cambridge Univ.Press.
[bib_4] Gelman A., Hill J., and Yajima M.. M.G.: Missing Data Imputation and Model Checking. . http://CRAN.R-project.org/package=mi.
[bib_5] Hoeting J.A., Madigan D., and Raftery A. E.. 1999. Bayesian model averaging: A tutorial with discussion. Statistical Science, 14 . 382-417. http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.ss/1
009212519.
[bib_6] Raftery A.E., Volinsky C.T., Hoeting J.A., Painte I., and Ka Yee Yeung . BMA: Bayesian Model
Averaging. . http://CRAN.R-project.org/package=BMA.
[bib_7] McLeod A.I. and Xu C.. Best Subset GLM. . http://CRAN.R-project.org/package=bestglm.
[bib_8] Snijders T.A.B.. 2005. Fixed and Random Effects. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 2 . 664-665. Wiley.
37
Created by XMLmind XSL-FO Converter .
Appendix A. Appendix: R scripts of regression models
These R scripts can be dowloaded from http://web.tatk.elte.hu/~eregr/
Data sets are from the open data base of KSH Central Statistical Office
1. Fixed coefficient Poisson model glm(Y ~ offset(LOGN) + AGE + GENDER + MEGYE, family=poisson(link="log"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))
2. Fixed and random coefficient Poisson model require(hglm) hglm(fixed = Y ~ offset(LOGN) + AGE + GENDER , random = ~ 1 | MEGYE, family = poisson(link = "log"), rand.family = gaussian(link = "identity"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))
3. Fixed coefficient negative-binomial model require(MASS) glm.nb(Y ~ offset(LOGN) + AGE data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))
+ GENDER + MEGYE,
4. Fixed coefficient logistic-binomial model glm(cbind(Y,N-Y)~ AGE + GENDER + MEGYE, family = binomial(link="logit"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))
5. Fixed és random coefficient logistic-binomial model require(hglm) hglm(fixed = Y/N ~ AGE + GENDER , random = ~ 1 | MEGYE, family = binomial(link = logit), rand.family = gaussian(link = "identity"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))
6. Fixed and multiple random coefficient Poisson model require(hglm) hglm(fixed = Y ~ offset(LOGN) + AGE + GENDER + AGE:GENDER , random = ~ 1
|MEGYE:GENDER:AGGE, family = poisson(link = log), rand.family = gaussian(link = "identity"), method="REML", data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))
7. Hierarchical Poisson model require(hglm) hglm(fixed = Y ~ offset(LOGN) + AGE + GENDER + AGE:GENDER + REGIO , random = ~ 1 | MEGYE , disp = ~ REGIO , family = poisson(link = log), rand.family = gaussian(link = "identity"), method="REML", data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))
38
Created by XMLmind XSL-FO Converter .