2. Fixed and random coefficient Poisson model

advertisement

Regression models for health policy by examples

Zoltán Vokó

Sándor Kabos

András Lőw

Created by XMLmind XSL-FO Converter .

Regression models for health policy by examples by Zoltán Vokó, Sándor Kabos, and András Lőw

Created by XMLmind XSL-FO Converter .

Table of Contents

1. Introduction .................................................................................................................................... 1

1. Occurrence relationships in epidemiology ............................................................................ 1

2. Statistical background ........................................................................................................... 1

2. Poisson regression with categorical predictors ............................................................................... 2

1. Data analysis examples ......................................................................................................... 2

3. Regression models with numerical and categorical predictors ...................................................... 7

4. Fixed and random coefficients regression ...................................................................................... 8

5. Mapping of regression-based estimates .......................................................................................... 9

1. Data analysis examples ......................................................................................................... 9

2. Examples of analysis of geographic data ............................................................................ 20

References ........................................................................................................................................ 37

A. Appendix: R scripts of regression models ................................................................................... 38

1. Fixed coefficient Poisson model ......................................................................................... 38

2. Fixed and random coefficient Poisson model ..................................................................... 38

3. Fixed coefficient negative-binomial model ......................................................................... 38

4. Fixed coefficient logistic-binomial model .......................................................................... 38

5. Fixed és random coefficient logistic-binomial model ......................................................... 38

6. Fixed and multiple random coefficient Poisson model ....................................................... 38

7. Hierarchical Poisson model ................................................................................................. 38

iii

Created by XMLmind XSL-FO Converter .

List of Tables

5.1. Table 5.3.1. Mortality of men (/100 000 persons) in 2009 (Source: KSH) ............................... 28

5.2. Table 5.3.2. Mortality of women (/100 000 persons) in 2009 (Source: KSH) ........................... 29

iv

Created by XMLmind XSL-FO Converter .

List of Examples

2.1. Poisson regression according to age mortality, gender and number of population ...................... 2

2.2. Negative binomial regression, according to mortality, age group, gender, age*gender interaction and population ........................................................................................................................................... 4

5.1. Poisson regression, mixed model. Mortality gender and age group as fix explanatories, county as random explanatory ............................................................................................................................ 9

5.2. Hierarchical Poisson regression. Mortality, gender and age group and region as fix explanatories, county as random explanatory .......................................................................................................... 12

5.3. Poisson regression. Mortality, gender, age group and region as fix explanatories, county*gender*age

(2 groups) as random explanatories .................................................................................................. 15

v

Created by XMLmind XSL-FO Converter .

Chapter 1. Introduction

1. Occurrence relationships in epidemiology

The objective of epidemiological studies is to quantify occurrence relationships:

• how is the risk of a disease related to an exposure,

• how does the probability of the presence of the disease depend on signs, symptoms, findings,

• how is the outcome of a disease related to a treatment,

• how is the occurrence of a disease related to time, space, population, and characteristics of the populations,

Generalized linear models are the most widely used statistical methods to mathematically represent these relationships and to estimate their parameters.

The primary aim of this compilation of examples is to help students learning the appropriate way of fitting statistical models and interpreting their results.

The compilation of examples contains examples that are widely used in the epidemiological practice: analysis of aggregate data and cross-sectional studies (surveys).

It is recommended to proceed continuously in the material because its parts are built on each other.

2. Statistical background

Basic statistical knowledge is required (e.g. Faraway[bib_2]).

At the end of each chapter you will find a statistics summary with basic mathematical statistics information and formal drafting of applied models.

The statistical background of the presented regression analysis is the generalized linear model (detailed

description see Gelman-Hill,chapter 6 [bib_3]). Technical details are not easy, so we only deal with as much as

needed in fitting statistical models and interpreting their results.

When interpreting regression models we examine the fitting of the model. As demonstrated in the example, if the model does not fit, the partial results are not accepted as authentic.

1

Created by XMLmind XSL-FO Converter .

AGE.05-09

AGE.10-14

AGE.15-19

AGE.25-29

AGE.30-34

AGE.35-39

AGE.40-44

AGE.45-49

AGE.50-54

AGE.55-59

AGE.60-64

AGE.65-69

AGE.70-74

Chapter 2. Poisson regression with categorical predictors

1. Data analysis examples

This chapter analises Hungary's mortality data of the year 2009. The research question is how the mortality depends on gender, age and the population number of the given habitation.

Input data:

• mortality (age 5) categorizing by gender, number of population

• resident population (same categorizing)

Example 2.1. Poisson regression according to age mortality, gender and number of population call poisson: Y ~ offset(LOGN) + AGE + GENDER + LSZKOD

Incidence density ratio

Coefficient Coefficient S.E.

z value Pr(>|z|)

(Intercept)

AGE.00-04

0,00052

2,33484

-7,5534

0,84794

0,05562

0,0694

-135,806

12,219

0

0

0,24172

0,31793

0,67058

1,10842

1,64896

2,86927

5,54978

11,49942

20,58072

27,22385

38,51154

52,60813

78,95571

-1,41996

-1,14592

-0,39961

0,10294

0,50014

1,05406

1,71376

2,4423

3,02435

3,30409

3,65096

3,96287

4,36889

0,14148

0,12354

0,08914

0,07531

0,06718

0,06324

0,06002

0,05794

0,05663

0,05631

0,05622

0,0561

0,05601

-10,03663

-9,2756

-4,48303

1,36684

7,44527

16,66724

28,55144

42,15066

53,40833

58,67517

64,93928

70,64271

78,00325

0

0

0

0

0

0

0

0

0

0

0

1,00E-05

0,17168

2

Created by XMLmind XSL-FO Converter .

Poisson regression with categorical predictors

AGE.75-79

LSZKOD.

1000–1999

Incidence density ratio

128,03337

AGE.80-84

AGE.85-X

218,1694

2004,5429

GENDER.F

0,4294

LSZKOD. –999 1,47814

1,48537

LSZKOD.

2000–4999

1,50867

Coefficient

4,85229

5,38527

7,60317

0,39567

0,41123

-0,84536

0,39079

Coefficient S.E.

0,05586

0,05582

0,05546

0,0042

0,00846

0,00833

0,00736 z value

86,86199

96,47972

137,08131

-201,3572

46,20179

47,49887

55,86308

Pr(>|z|)

0

0

0

0

0

0

0

LSZKOD.

5000–9999

1,43496

LSZKOD.

10000–19999

1,36816

LSZKOD.

20000–49999

1,33951

LSZKOD.

50000–99999

1,26126

0,36113

0,31346

0,2923

0,23211

0,00842

0,0081

0,00793

0,00968

42,88961

38,68758

36,86062

23,98679

0

0

0

0

LSZKOD. 100-

300 ezer

1,22477 0,20275 0,00828 24,49597 0

AGE ref.level: .20-24

GENDER ref.level: .MALE

LSZKOD ref.level= .BP

The reference category of data analysis were men, age 20-24 from Budapest. The model evaluates their mortality by Intercept=0,00052 (azaz 5,2 / 10 000) value. Other coefficients of incidence density ratio are calculated as relative to this one, e.g. the mortality in category of men of age 50-54 living in Budapest is estimated as 0,00052*20,58 =0,0107.

In a settlement of maximum 999 inhabitants, among women aged 50-54 this value is 0,00052*20,58

*1,47*0,4294=0,0067. It is important to know that these are not factual data but estimations and we get a different estimation from the same data using different models.

Goodness of fit signif = 0

(resid deviance = 3792,5 , resid df = 297 )

The above example shows that the the model does not fit. This does not mean that the evaluations of the model are all wrong.

3

Created by XMLmind XSL-FO Converter .

Poisson regression with categorical predictors

The main point is that you can not refer to the value of significance of a specific factor as valid unless the model fits well (i.e. the Goodness of fit signif > 0,05).

Example 2.2. Negative binomial regression, according to mortality, age group, gender, age*gender interaction and population call negbin: Y ~ offset(LOGN) + AGE + GENDER + AGE:GENDER + LSZKOD

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

(Intercept)

AGE.00-04

0,00063 -7,37688 0,06673 -110,5426 0

1,59647 0,46779 0,08893 5,26006 0

AGE.05-09

AGE.10-14

AGE.15-19

AGE.25-29

AGE.30-34

AGE.35-39

AGE.40-44

AGE.45-49

AGE.50-54

AGE.55-59

AGE.60-64

AGE.65-69

AGE.70-74

AGE.75-79

AGE.80-84

AGE.85-X

GENDER.F

LSZKOD. –999

LSZKOD. 1000–1999

0,18065 -1,71117 0,18443 -9,278 0

0,23681 -1,44049 0,16086 -8,95509 0

0,57782 -0,54849 0,10937 -5,01514 0

1,09534 0,09106 0,08949 1,01756 0,30889

1,5239 0,42127 0,08138 5,17668 0

2,53666 0,93085 0,07725 12,04994 0

5,056 1,62058 0,07324 22,12745 0

10,62039 2,36278 0,07078 33,38066 0

19,26348 2,95821 0,06922 42,73746 0

25,80242 3,25047 0,06884 47,21925 0

35,85012 3,57935 0,06877 52,0477 0

48,10098 3,8733 0,06867 56,40586 0

67,72765 4,21549 0,06866 61,39468 0

101,5715 4,62076 0,06853 67,42367 0

152,48681 5,02708 0,06863 73,24823 0

2013,381 7,60757 0,06782 112,17728 0

0,30983 -1,17172 0,13452 -8,71039 0

1,37744 0,32023 0,02026 15,80531 0

1,33491 0,28886 0,02005 14,40887 0

4

Created by XMLmind XSL-FO Converter .

LSZKOD. 2000–4999

LSZKOD. 5000–9999

LSZKOD. 10000–19999

LSZKOD. 20000–49999

LSZKOD. 50000–99999

LSZKOD. 100-300 ezer

AGE.00-04:GENDER.F

AGE.05-09:GENDER.F

AGE.10-14:GENDER.F

AGE.15-19:GENDER.F

AGE.25-29:GENDER.F

AGE.30-34:GENDER.F

AGE.35-39:GENDER.F

AGE.40-44:GENDER.F

AGE.45-49:GENDER.F

AGE.50-54:GENDER.F

AGE.55-59:GENDER.F

AGE.60-64:GENDER.F

AGE.65-69:GENDER.F

AGE.70-74:GENDER.F

AGE.75-79:GENDER.F

AGE.80-84:GENDER.F

AGE.85-X:GENDER.F

AGE ref.level: .20-24

GENDER ref.level: .MALE

5

Created by XMLmind XSL-FO Converter .

Poisson regression with categorical predictors

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

1,34887 0,29927 0,01924 15,55678 0

1,3035 0,26505 0,02012 13,17331 0

1,21119 0,1916 0,01983 9,66075 0

1,19949 0,18189 0,01974 9,21476 0

1,10914 0,10358 0,02113 4,90233 0

1,09874 0,09416 0,02003 4,70207 0

3,03915 1,11158 0,16041 6,92976 0

2,50165 0,91695 0,2969 3,08841 0,00201

2,53023 0,92831 0,26188 3,54486 0,00039

1,72242 0,54373 0,20142 2,6995 0,00694

1,0535 0,05212 0,18197 0,28643 0,77455

1,36086 0,30811 0,16059 1,91861 0,05503

1,57286 0,45289 0,15167 2,98613 0,00283

1,42547 0,3545 0,14578 2,43181 0,01502

1,37262 0,31672 0,14168 2,23544 0,02539

1,31095 0,27075 0,13912 1,94622 0,05163

1,26055 0,23155 0,1385 1,67188 0,09455

1,33435 0,28844 0,13825 2,08632 0,03695

1,40029 0,33668 0,13796 2,44034 0,01467

1,61923 0,48195 0,13774 3,49893 0,00047

1,88321 0,63298 0,13749 4,60385 0

2,27707 0,82289 0,13745 5,98678 0

1,18804 0,1723 0,13681 1,25946 0,20786

Poisson regression with categorical predictors

LSZKOD ref.level= .BP

Goodness of fit signif = 0,095042

(resid deviance = 311,45 , resid df = 280 )

Model 2.1.2. fits well at significance 0.05, because the interaction age*gender has been involved and besides we changed from Poisson model into negative binomial.

6

Created by XMLmind XSL-FO Converter .

Chapter 3. Regression models with numerical and categorical predictors

(summary)

In this chapter we examine the mortality data of Vas county with data according to mortality by settlement, age

(5) and gender

• population (same division)

• Environmental variables by settlement

Number of population (LSZKOD), distance of ambulance station to settlement (mento), rate of unemployed

(munkanelkarany), number of population with high-school qualification (kozepisk), number of population with higher education (felsofoku).

7

Created by XMLmind XSL-FO Converter .

Chapter 4. Fixed and random coefficients regression

(summary)

In this chapter we will examine the data of Európai Lakossági Egészségfelmérés – European Health Interview

Survey (ELEF 2009)

ELEF2009 was the first uniform European health interview survey that was carried out with the same methodology in all EU member states. The survey was carried out in fall, 2009. The Hungarian sample contained 449 settlements in two-stage sampling framework. From 7000 intended people 5051 answered the survey. The survey contained information regarding health status (illnesses, accidents, disabilities, work conditions, phisycal and emotional status); health behaviour (exercise, eating habits, smoking, alcohol consumption, drog abuse); health care use, health related expenses and socio-economic factors (gender, age, marital status, academic background, labor market status, income) .

8

Created by XMLmind XSL-FO Converter .

AGE.00-04

AGE.05-09

AGE.10-14

AGE.15-19

AGE.25-29

AGE.30-34

AGE.35-39

AGE.40-44

AGE.45-49

Chapter 5. Mapping of regressionbased estimates

In this chapter the base model is mixed Poisson regression, where the random factor is County (MEGYE). The model contains too many estimated parameters, so we used mapping for evaluation. During the anylysis we examine the effect of more random parameters, on the other hand while examining the effect of the region we reached a simple hierarchical model.

The descriptive epidemiological anylysis usually examine disease; place, time and population risc factors. The possibilities offered by geographic information systems are becoming more widely used to carry out such analysis. The analysis based on the result, the data collated by geographical location is not only suitable for display on a purely descriptive way, but also the analysis of relationships (eg, disease detection, monitoring).

1. Data analysis examples

We examine Hungary’s 2009 mortality data. The research question to be answered is the spatial distribution of mortality, the impact of gender and age

Input data:

• Total mortality by age (5-year age group), sex, place of residence, divided by county

• population (same division)

Example 5.1. Poisson regression, mixed model. Mortality gender and age group as fix explanatories, county as random explanatory fixed part

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

(Intercept) 0,00077 -7,16346 0,09247 -77,47157 0

1,58939 0,46335 0,12369 3,74607 2,00E-004

0,18062 -1,71137 0,26487 -6,46128 0

0,23744 -1,43784 0,23034 -6,24229 0

0,57746 -0,54911 0,15425 -3,55986 4,00E-004

1,08709 0,0835 0,12455 0,67044 0,50281

1,5106 0,41251 0,11223 3,67569 0,00026

2,53017 0,92829 0,10592 8,76403 0

5,06703 1,62275 0,09976 16,26586 0

10,66057 2,36655 0,096 24,65157 0

9

Created by XMLmind XSL-FO Converter .

AGE.50-54

AGE.55-59

AGE.60-64

AGE.65-69

AGE.70-74

AGE.75-79

AGE.80-84

AGE.85-X

GENDER.F

AGE.00-04:GENDER.F

AGE.05-09:GENDER.F

AGE.10-14:GENDER.F

AGE.15-19:GENDER.F

AGE.25-29:GENDER.F

AGE.30-34:GENDER.F

AGE.35-39:GENDER.F

AGE.40-44:GENDER.F

AGE.45-49:GENDER.F

AGE.50-54:GENDER.F

AGE.55-59:GENDER.F

AGE.60-64:GENDER.F

AGE.65-69:GENDER.F

AGE.70-74:GENDER.F

AGE.75-79:GENDER.F

Mapping of regression-based estimates

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

19,34415 2,96239 0,09361 31,64652 0

25,81499 3,25096 0,09304 34,94296 0

35,7357 3,57615 0,09294 38,47896 0

47,93188 3,86978 0,09278 41,70918 0

67,73072 4,21554 0,09277 45,44005 0

100,97297 4,61485 0,09258 49,84919 0

151,8468 5,02287 0,09271 54,17843 0

1206,6027

2

7,09556 0,09161 77,45365 0

0,30731 -1,17991 0,19168 -6,15567 0

3,06532 1,12015 0,2269 4,93676 0

2,52204 0,92507 0,42734 2,16471 0,03076

2,55128 0,9366 0,37617 2,48981 0,01302

1,72751 0,54668 0,28749 1,90153 0,05766

1,05763 0,05603 0,25886 0,21645 0,8287

1,36753 0,313 0,22718 1,3778 0,16873

1,58168 0,45849 0,21389 2,1436 0,03243

1,43111 0,35845 0,20512 1,74753 0,08101

1,37626 0,31937 0,19903 1,60459 0,10906

1,3174 0,27566 0,19522 1,41202 0,15841

1,26534 0,23534 0,19431 1,21117 0,22626

1,34924 0,29954 0,19394 1,54449 0,12294

1,41711 0,34862 0,19352 1,8015 0,07208

1,63193 0,48976 0,1932 2,53495 0,01147

1,91226 0,64829 0,19283 3,36194 0,00082

10

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

2,30186 0,83372 0,19277 4,32501 2,00E-005

1,97642 0,68129 0,1919 3,55023 0,00041

AGE.80-84:GENDER.F

AGE.85-X:GENDER.F

AGE ref.level: .20-24

GENDER ref.level: .MALE

random part

.Budapest

.Pest

.Fejér

.Komárom-Esztergom

.Veszprém

.Győr-Sopron

.Vas

.Zala

.Baranya

.Somogy

.Tolna

.Borsod-Abaúj-Zempl

.Heves

.Nógrád

.Hajdú-Bihar

.Jász-Nkun-Szolnok

.Szabolcs-Szatmár

.Bács-Kiskun

.Békés

-0,04176

-0,05531

0,00573

0,06831

-0,02113

0,07193

0,02375

0,06502

Estimate

-0,17654

-0,02374

0,02456

0,09173

-0,00481

-0,06121

-0,00681

0,03247

0,06943

-0,02217

-0,00118

11

Created by XMLmind XSL-FO Converter .

0,02317

0,02227

0,02094

0,02145

0,02365

0,01847

0,02161

0,02401

Std.Error

0,01643

0,0175

0,02088

0,02218

0,0215

0,02068

0,01989

0,02047

0,01977

0,01957

0,02037

Mapping of regression-based estimates

.Csongrád

Estimate

-0,03826

AGE.10-14

AGE.15-19

AGE.25-29

AGE.30-34

AGE.35-39

AGE.40-44

AGE.45-49

AGE.50-54

AGE.55-59

AGE.60-64

AGE.65-69

AGE.70-74

AGE.75-79

AGE.80-84

AGE.85-X

Example 5.2. Hierarchical Poisson regression. Mortality, gender and age group and region as fix explanatories, county as random explanatory fixed part variable Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

(Intercept)

AGE.00-04

AGE.05-09

0,00068 -7,28734 0,34706 -20,99735 0

1,59218 0,4651 0,46704 0,99585 0,31968

0,18117 -1,70832 1,00102 -1,70815 0,08807

GENDER.F

REGIO.Közép-Dunántúl

0,23832 -1,43414 0,86976 -1,6489 0,09963

0,57866 -0,54704 0,58246 -0,93919 0,34797

1,08497 0,08155 0,47031 0,17339 0,86239

1,50873 0,41127 0,42376 0,97052 0,33213

2,5317 0,92889 0,39995 2,3225 0,0205

5,07492 1,62431 0,37671 4,31182 2,00E-005

10,67228 2,36765 0,3625 6,53145 0

19,35025 2,96271 0,35347 8,38176 0

25,80901 3,25072 0,35131 9,25321 0

35,71514 3,57557 0,35094 10,18867 0

47,8534 3,86814 0,35034 11,04109 0

67,57056 4,21317 0,35031 12,02704 0

100,57403 4,61089 0,34957 13,19015 0

150,97558 5,01712 0,35007 14,33156 0

1197,9428

5

7,08836 0,34592 20,49122 0

0,30695 -1,18107 0,72379 -1,6318 0,10319

1,17536 0,16157 0,06134 2,63385 0,00864

12

Created by XMLmind XSL-FO Converter .

Std.Error

0,0205

Mapping of regression-based estimates variable

REGIO.Nyugat-Dunántúl

REGIO.Dél-Dunántúl

REGIO.Észak-Magyarország

REGIO.Észak-Alföld

REGIO.Dél-Alföld

AGE.00-04:GENDER.F

AGE.05-09:GENDER.F

AGE.10-14:GENDER.F

AGE.15-19:GENDER.F

AGE.25-29:GENDER.F

AGE.30-34:GENDER.F

AGE.35-39:GENDER.F

AGE.40-44:GENDER.F

AGE.45-49:GENDER.F

AGE.50-54:GENDER.F

AGE.55-59:GENDER.F

AGE.60-64:GENDER.F

AGE.65-69:GENDER.F

AGE.70-74:GENDER.F

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

1,06974

1,15893

1,20472

1,17185

1,72913

1,058

1,36844 0,31367 0,85781 0,36566 0,71473

1,58283 0,45921 0,80763 0,56859 0,56982

1,43114 0,35847 0,77453 0,46282 0,64364

1,37653 0,31957 0,75156 0,4252 0,67082

1,31752 0,27575 0,73717 0,37406 0,70847

1,26518 0,23522 0,73371 0,32059 0,74862

1,34863

1,41745

1,63348

0,06742

0,1475

0,18624

0,15858

0,54762

0,05638

0,29909

0,34886

0,49071

0,0623

0,0622

0,06006

0,05832

1,08559

0,97746

0,73233

0,73073

0,72954

1,08215

2,37124

3,10075

2,71909

0,50444

0,05768

0,40841

0,47741

0,67263

0,27957

0,01801

0,00201

0,00671

1,10954 0,10394 0,05851 1,77663 0,07608

3,06913 1,12139 0,85677 1,30887 0,19103

2,5252 0,92632 1,6136 0,57407 0,56611

2,5542 0,93774 1,42042 0,66018 0,50936

0,61412

0,95402

0,6831

0,63322

0,50141

AGE.75-79:GENDER.F

AGE.80-84:GENDER.F

1,91529 0,64987 0,72814 0,89251 0,37244

2,30636 0,83567 0,72789 1,14807 0,25135

AGE.85-X:GENDER.F

AGE ref.level: .20-24

GENDER ref.level: .MALE

REGIO ref.level:. Közép-Magyarország random part

1,98076 0,68348 0,72462

13

Created by XMLmind XSL-FO Converter .

0,94322 0,34591

Mapping of regression-based estimates

.Budapest

.Pest

.Fejér

.Komárom-Esztergom

.Veszprém

.Győr-Sopron

.Vas

.Zala

.Baranya

.Somogy

.Tolna

.Borsod-Abaúj-Zemplén

.Heves

.Nógrád

.Hajdú-Bihar

.Jász-Nkun-Szolnok

Estimate

-0,05174

0,05174

-0,00455

0,0197

-0,01515

-0,00279

0,00345

-0,00066

-0,00586

0,01813

-0,01228

0,00813

-0,01157

0,00344

-0,01753

3,00E-004

.Szabolcs-Szatmár

.Bács-Kiskun

0,01724

-0,00074

0,03984

0,03979

.Békés region

Közép-Magyarország

0,00822

Budapest

0,0402

.Csongrád -0,00748

Variable county is hierarchically imbedded into variable region as follows:

0,04027 county

Közép-Dunántúl

Pest

Fejér

Std.Error

0,03849

0,03849

0,04107

0,04156

0,0413

0,04121

0,04208

0,04175

0,04127

0,04144

0,04224

0,03996

0,04112

0,04216

0,0399

0,04021

14

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates region

Észak- Dunántúl

Dél- Dunántúl

Észak-Magyarország

Észak-Alföld

Dél-Alföld county

Komárom-Esztergom

Veszprém

Győr-Sopron

Vas

Zala

Baranya

Somogy

Tolna

Borsod-Abaúj-Zemplén

Heves

Nógrád

Hajdú-Bihar

Jász-Nagykun-Szolnok

Szabolcs-Szatmár

Bács-Kiskun

Békés

Csongrád

Example 5.3. Poisson regression. Mortality, gender, age group and region as fix explanatories, county*gender*age (2 groups) as random explanatories fixed part variable Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

(Intercept)

AGE.00-04

AGE.05-09

0,00078 -7,15859 0,08301 -86,2351 0

1,59329 0,4658 0,10937 4,25883 2,00E-005

0,18078 -1,71048 0,2342 -7,30341 0

AGE.10-14 0,23722 -1,43875 0,20368 -7,06396 0

15

Created by XMLmind XSL-FO Converter .

variable

AGE.15-19

AGE.25-29

AGE.30-34

AGE.35-39

AGE.40-44

AGE.45-49

AGE.50-54

AGE.55-59

AGE.60-64

AGE.65-69

AGE.70-74

AGE.75-79

AGE.80-84

AGE.85-X

GENDER.F

AGE.00-04:GENDER.F

AGE.05-09:GENDER.F

AGE.10-14:GENDER.F

AGE.15-19:GENDER.F

AGE.25-29:GENDER.F

AGE.30-34:GENDER.F

AGE.35-39:GENDER.F

AGE.40-44:GENDER.F

AGE.45-49:GENDER.F

Mapping of regression-based estimates

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

0,57697 -0,54997 0,13639 -4,0322 6,00E-005

1,0893 0,08553 0,11013 0,77661 0,43769

1,51713 0,41682 0,09924 4,20014 3,00E-005

2,53994 0,93214 0,09366 9,95211 0

5,07895 1,6251 0,08822 18,42167 0

10,66252 2,36674 0,08489 27,88101 0

19,33157 2,96174 0,08277 35,78133 0

25,84247 3,25202 0,08227 39,52974 0

35,78517 3,57753 0,08218 43,53171 0

47,23767 3,85519 0,08638 44,62836 0

66,82966 4,20215 0,08637 48,65094 0

99,48389 2011,04,06 0,08621 53,35728 0

149,20587 5,00533 0,08633 57,98004 0

1182,7626

6

7,07561 0,08541 82,84178 0

0,30096 -1,20079 0,17171 -6,99319 0

3,05752 1,11761 0,20064 5,57023 0

2,52642 0,9268 0,37788 2,45267 0,01445

2,56568 0,94222 0,33263 2,83266 0,00477

1,73553 0,55132 0,25422 2,16869 0,03049

1,05155 0,05026 0,2289 0,21959 0,82626

1,35516 0,30392 0,20089 1,51284 0,13083

1,57186 0,45226 0,18913 2,39121 0,01709

1,42679 0,35542 0,18138 1,9596 0,05049

1,37772 0,32043 0,176 1,82066 0,06914

16

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

.Pest

.Fejér

.Fejér

.Fejér

.Fejér

.Budapest

.Pest

.Pest

.Pest

variable

AGE.50-54:GENDER.F

AGE.55-59:GENDER.F

AGE.60-64:GENDER.F

AGE.65-69:GENDER.F

Incidence density ratio

Coefficient Coefficien t S.E.

z value Pr(>|z|)

1,32005 0,27767 0,17263 1,60852 0,10823

1,26126 0,23211 0,17182 1,3509 0,17722

1,34192 0,2941 0,1715 1,71491 0,08686

1,47544 0,38895 0,17538 2,21773 0,02694

AGE.70-74:GENDER.F

AGE.75-79:GENDER.F

1,69597 0,52825 0,17511 3,01672 0,00266

1,99159 0,68893 0,17479 3,9415 9,00E-005

AGE.80-84:GENDER.F

AGE.85-X:GENDER.F

2,4072 0,87846 0,17474 5,02732 0

2,0762

AGE ref.level: .20-24

GENDER ref.level: .MALE

random part age (2 groups): AGE 64 and younger, 65 and older

0,73054 0,174 4,19859 3,00E-005

MEGYE

.Budapest

.Budapest

.Budapest

GENDER

.M

.M

.F

AGGE

.-64

.65+

.-64

Coefficient

-0,213925

-0,094141

-0,056986 exp()

0,810314

0,910665

0,946663

.F

.M

.M

.F

.F

.M

.M

.F

.F

.65+

.-64

.65+

.-64

.65+

.-64

.65+

.-64

.65+

-0,247406

-0,089623

-0,021785

-0,021977

-0,008711

0,000008

0,024822

-0,01569

0,033056

0,78316

0,917565

0,979

0,980392

0,994294

1,003677

1,025709

0,986574

1,036701

17

Created by XMLmind XSL-FO Converter .

.Vas

.Vas

.Vas

.Zala

.Zala

.Zala

.Zala

.Baranya

MEGYE

.Komárom-

Esztergom

.Komárom-

Esztergom

.Komárom-

Esztergom

.Komárom-

Esztergom

.Veszprém

.Veszprém

.Veszprém

.Veszprém

.Győr-Sopron

.Győr-Sopron

.Győr-Sopron

.Győr-Sopron

.Vas

.Baranya

.Baranya

.Baranya

.F

.M

.F

.F

.M

.F

.F

.M

.M

.M

.F

.F

.M

.M

.M

.F

.F

.M

.F

.F

.M

GENDER

.M

.M

.F

Mapping of regression-based estimates

AGGE

.-64

Coefficient

0,077849

.65+

.-64

.65+

.-64

.65+

.-64

.65+

.-64

.65+

.-64

.-64

.65+

.-64

.65+

.-64

.65+

.65+

.-64

.65+

.-64

.65+

.-64

.65+

0,042885

0,027264

0,127112

-0,060216

-0,024939

-0,003444

0,025449

-0,093541

-0,018875

-0,116642

-0,074526

-0,038448

-0,010211

-0,050228

-0,063834

-0,092797

-0,006371

-0,106337

-0,070847

-0,02283

0,000535

0,021527

0,010782

18

Created by XMLmind XSL-FO Converter .

exp()

1,084848

1,044404

1,029875

1,138942

0,89184

0,930961

0,965744

0,990397

0,953082

0,940968

0,914658

0,944949

0,975917

0,99873

1,028845

0,913977

0,981853

0,994207

0,901078

0,934392

0,980945

1,001097

1,023984

1,013866

.M

.F

.M

.F

.F

.M

GENDER

.M

.M

.F

.F

.M

.F

.M

.F

.F

.M

.F

.F

.M

.M

.M

.F

.F

.M

.Borsod-Abaúj-

Zemplén

.Borsod-Abaúj-

Zemplén

.Borsod-Abaúj-

Zemplén

.Heves

.Heves

.Heves

.Heves

.Nógrád

.Nógrád

.Nógrád

.Nógrád

.Hajdú-Bihar

.Hajdú-Bihar

.Hajdú-Bihar

.Hajdú-Bihar

MEGYE

.Somogy

.Somogy

.Somogy

.Somogy

.Tolna

.Tolna

.Tolna

.Tolna

.Borsod-Abaúj-

Zemplén

.65+

.-64

.65+

.65+

.-64

.65+

.-64

.65+

.-64

.65+

.-64

.65+

.-64

.65+

.-64

Mapping of regression-based estimates

.65+

.-64

.65+

.-64

AGGE

.-64

.65+

.-64

.65+

.-64

Coefficient

0,05887

0,030198

0,069468

0,085356

-0,014433

-0,020551

-0,066871

-0,015509

0,171443

0,03615

0,145759

0,047318

0,067652

0,034561

-0,00225

0,00033

0,037194

0,032501

0,033666

0,087144

-0,053384

0,005554

-0,050188

0,002241

19

Created by XMLmind XSL-FO Converter .

exp()

1,064453

1,031237

1,07427

1,092364

0,989217

0,980209

0,937351

0,987557

1,191287

1,037393

1,159435

1,051593

1,073842

1,035746

0,999924

1,003323

1,041628

1,033615

1,036489

1,094319

0,951426

1,006134

0,95312

1,005243

Mapping of regression-based estimates

MEGYE GENDER

.Jász-Nkun-Szolnok .M

.Jász-Nkun-Szolnok .M

.Jász-Nkun-Szolnok .F

.Jász-Nkun-Szolnok .F

.Szabolcs-Szatmár .M

AGGE

.-64

.65+

.-64

.65+

.-64

Coefficient

0,078645

0,002624

0,095987

0,025507

0,072532 exp()

1,085712

1,003191

1,10314

1,028905

1,079096

.Szabolcs-Szatmár .M

.Szabolcs-Szatmár .F

.Szabolcs-Szatmár .F

.Bács-Kiskun .M

.Bács-Kiskun

.Bács-Kiskun

.Bács-Kiskun

.Békés

.M

.F

.F

.M

.Békés

.Békés

.Békés

.Csongrád

.M

.F

.F

.M

.65+

.-64

.65+

.-64

.65+

.-64

.65+

.-64

0,048643

-0,009299

0,085568

0,041893

-0,03363

0,00419

-0,039088

0,060994

1,050435

0,9929

1,092596

1,046535

0,967472

1,006384

0,964544

1,066716

.65+

.-64

.65+

.-64

-0,00322

0,094299

-0,035073

-0,059777

0,997345

1,10128

0,968424

0,945363

.Csongrád

.Csongrád

.M

.F

.65+

.-64

-0,035977

-0,03572

0,965204

0,96701

.Csongrád .F

.65+ -0,034631 0,968852

The comparison of model 5.1.1. and 5.1.2 estimates are not as simple as in chapter 4.

Both because the data structure is slightly more complicated: the counties of residence by regions and counties are classified. In example 5.1.2 a more sophisticated model is the random factor, for the statistical summary, see

5.2. subsection. This is a very simple example of a hierarchical (also called "multilevel) analysis. A hierarchical data structure of the error flag is commonly hierarchical (nested) nature of the activity, in fact the stochastic nature of the model: random error is assumed and the output variable, as well as the output variable is determining the distribution of parameter(s). This far-reaching generalization of the random factor model is a hot area of research of modern statistics based on estimates of hierarchical simulation models for applications to fit the needs of a more flexible development of statistical procedures.

2. Examples of analysis of geographic data

20

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

In this chapter we examine the association between the model parameter estimates we got in 5.1. exercise and raw mortality data.

We draw maps of counties with the model parameter estimates, and we examine how much these maps resemble the age and gender specific mortality maps.

Figure 5.3.1. Mortality rate by county (/100 000 persons)

Figure 5.3.2. Age distribution of the counties (population of each county is taken as 100%)

The above maps are based on the crude mortality data published by the CSO. The three maps look different, but they share the feature that they are all different from the county specific estimates obtained in model 5.1.1.

21

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

The crude mortality rates for the total population and for women are in the same category for Szabolcs-Szatmár and Győr-Sopron counties, crude mortality rate of men is better in Szabolcs-Szatmár county. The estimate of

Győr-Sopron county on the other hand is much better in the Poission model than that of Szabolcs-Szatmár county. The reason is obviously the different age distribution we could see above. The following maps show the proportion of elderly by county.

Figure 5.3.3. Proportion of persons at age 45 years or more by county

Figure 5.3.4. Proportion of persons at age 65 years or more by county

22

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.5. Proportion of persons at age 85 years or more by county

One can see that the proportion of persons at age 45+ years is in the same category for Szabolcs-Szatmár and

Győr-Sopron counties, but there is a difference in the proportion of persons at age 65+ years. The population of

Győr-Sopron county is older, and Poisson regression takes this into account.

The incidence rate ratio is corrected for the different age distribution in the Poisson model.

Figure 5.3.6. Proportion of men at age 45 years or more by county

23

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.7. Proportion of men at age 65 years or more by county

Figure 5.3.8. Proportion of men at age 85 years or more by county

24

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.9. Proportion of women at age 45 years or more by county

Figure 5.3.10. Proportion of women at age 65 years or more by county

25

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.11. Proportion of women at age 85 years or more by county

In the comparison of Szabolcs-Szatmár and Győr-Sopron one can see that among men especially the proportion of men at age 85+ are different, and since this is the age-group in which the mortality rate is very high, this phenomenon explains the higher crude mortality rate of men in Győr-Sopron county.

It can also been seen that the proportion of elderly is very high in Budapest and very low in Pest county.

26

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.12. Age specific mortality estimated in the Poisson model for Bpest and Bács-Kiskun counties

Figure 5.3.13. Age specific mortality estimated in the Poisson model for Győr-Sopron and Szabolcs-Szatmár counties

27

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.14. Age specific mortality estimated in the Poisson model for Vas and Nógrád counties

Table 5.1. Table 5.3.1. Mortality of men (/100 000 persons) in 2009 (Source: KSH)

Budapest

Pest

Fejér

Komárom-

Esztergom

Veszprém

Győr-Moson-

Sopron

Vas

Zala

Baranya

AGE:.45-54

934

1098

1222

1267

1167

1064

1083

1087

1150

AGE:.55-64

1925

2152

2214

2639

2164

2153

2392

2021

2236

AGE:.65-74 AGE:.75-84

3378 7525

4126

4696

8989

9666

4921

4213

4147

4151

3997

4383

9944

8719

9140

9486

9921

9442

AGE:.85-X

16450

17839

17484

20074

17339

20647

17535

18634

17598

28

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Somogy

AGE:.45-54

1366

Tolna 1114

Borsod-Abaúj-

Zemplén

1435

Heves

Nógrád

1319

1211

Hajdú-Bihar 1122

Jász-Nagykun-

Szolnok

1325

Szabolcs-

Szatmár-Bereg

1242

Bács-Kiskun 1216

Békés 1275

Csongrád 1069

2574

2564

2226

2441

AGE:.55-64

2377

2345

2716

2552

2372

2593

2231

4664

4555

4434

4532

AGE:.65-74 AGE:.75-84

4491 10439

4314

4979

8714

9653

5061

4158

4650

3913

10225

10314

9620

9340

10103

8568

9022

8897

AGE:.85-X

20564

15956

18802

20748

20331

17600

17027

18950

19168

16857

19160

Table 5.2. Table 5.3.2. Mortality of women (/100 000 persons) in 2009 (Source: KSH)

AGE:.45-54 AGE:.55-64 AGE:.65-74 AGE:.75-84 AGE:.85-X

Budapest

Pest

Fejér

Komárom-

Esztergom

Veszprém

Győr-Moson-

Sopron

Vas

Zala

492

472

498

525

462

390

392

377

897

918

888

947

912

775

863

773

1867

1959

2055

2384

1975

1814

1831

2038

5343

6119

5904

6775

6241

5879

5805

5867

14547

15776

16208

15859

15226

15638

17246

16196

Baranya

Somogy

545

555

946

1036

1989

2283

6036

6484

16106

17326

29

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Tolna

AGE:.45-54

366

AGE:.55-64

873

AGE:.65-74 AGE:.75-84

2070 6207

AGE:.85-X

14967

Borsod-Abaúj-

Zemplén

526

Heves

Nógrád

456

486

1090

919

966

814

2314

2207

2237

1838

6256

6327

6785

6123

14999

15416

17176

16027 Hajdú-Bihar 465

Jász-Nagykun-

Szolnok

588

Szabolcs-

Szatmár-Bereg

455

1031

931

2135

2224

6033

6459

16860

16716

Bács-Kiskun 465

Békés 516

Csongrád 480

908

1033

1925

2152

5831

5917

16106

15253

870 1987 6104 16064

The Poission model (as any other parametric model) expresses the associations in its own structure. We selected some counties and plotted the estimated age-specific mortalities.

These figures show the mechanism, how the different age distribution (or more generally, the difference in the distributions of any explanatory variable) are taken into account by the Poisson model.

30

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.15. Mortality rate (/100 000 persons) at age 45 years or more.

Figure 5.3.16. Mortality rate (/100 000 persons) at age 65 years or more.

Figure 5.3.17. Mortality rate (/100 000 persons) at age 85 years or more.

31

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.18. Mortality rate (/100 000 men) by county

Figure 5.3.19. Mortality rate (/100 000 men) at age 45 years or more.

32

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.20. Mortality rate (/100 000 men) at age 65 years or more.

Figure 5.3.21. Mortality rate (/100 000 men) at age 85 years or more.

33

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.22. Mortality rate (/100 000 women) by county

Figure 5.3.23. Mortality rate (/100 000 women) at age 45 years or more.

34

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

Figure 5.3.24. Mortality rate (/100 000 women) at age 65 years or more.

Figure 5.3.25. Mortality rate (/100 000 women) at age 85 years or more.

The age and gender specific mortality rates bring us closer to the understanding the county specific incidence rate ratios estimated by the 5.1.1. Poisson model. probably the most surprising result is the far the best value of

Budapest, because of the very favorable mortality of 85+ woman.

35

Created by XMLmind XSL-FO Converter .

Mapping of regression-based estimates

It is important to understand that this phenomenon partially can also be seen on the crude mortality maps, but is it reflected by the Poisson model if only this can also be seen .

Figure. 5.3.26. IDR components (/1000) as calculated by model 5.1.1.

Figure 5.3.27. IDR components (/1000) as calculated by model 5.1.2.

REGIO (region) is a fixed variable of the hierarchical Poisson model and MEGYE (county) is an imbedded random factor. The figures demonstrate the “shrinkage” effect of model 5.1.2. compared with model 5.1.1. because some of counties went closer to each others within a region.

In brief: the mixed model 5.1.1. can be generalized in two ways: first as a hierarchical one in model 5.1.2. the involving multiple random factors in model 5.1.3. The goodness-of-fit tests can be performed by bootstrap

(Gelman, R package mi).

36

Created by XMLmind XSL-FO Converter .

References

[bib_1] Categorical Data Analysis.

. Agresti A.. Copyright © 1990. Wiley.

[bib_2] Faraway J.J.. 2002. Practical Regression and Anova using R.

. http://cran.rproject.org/doc/contrib/Faraway-PRA.pdf.

[bib_3] Gelman A. and Hill J.. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models .

Cambridge Univ.Press.

[bib_4] Gelman A., Hill J., and Yajima M.. M.G.: Missing Data Imputation and Model Checking. . http://CRAN.R-project.org/package=mi.

[bib_5] Hoeting J.A., Madigan D., and Raftery A. E.. 1999. Bayesian model averaging: A tutorial with discussion. Statistical Science, 14 . 382-417. http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.ss/1

009212519.

[bib_6] Raftery A.E., Volinsky C.T., Hoeting J.A., Painte I., and Ka Yee Yeung . BMA: Bayesian Model

Averaging. . http://CRAN.R-project.org/package=BMA.

[bib_7] McLeod A.I. and Xu C.. Best Subset GLM. . http://CRAN.R-project.org/package=bestglm.

[bib_8] Snijders T.A.B.. 2005. Fixed and Random Effects. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 2 . 664-665. Wiley.

37

Created by XMLmind XSL-FO Converter .

Appendix A. Appendix: R scripts of regression models

These R scripts can be dowloaded from http://web.tatk.elte.hu/~eregr/

Data sets are from the open data base of KSH Central Statistical Office

1. Fixed coefficient Poisson model glm(Y ~ offset(LOGN) + AGE + GENDER + MEGYE, family=poisson(link="log"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))

2. Fixed and random coefficient Poisson model require(hglm) hglm(fixed = Y ~ offset(LOGN) + AGE + GENDER , random = ~ 1 | MEGYE, family = poisson(link = "log"), rand.family = gaussian(link = "identity"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))

3. Fixed coefficient negative-binomial model require(MASS) glm.nb(Y ~ offset(LOGN) + AGE data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))

+ GENDER + MEGYE,

4. Fixed coefficient logistic-binomial model glm(cbind(Y,N-Y)~ AGE + GENDER + MEGYE, family = binomial(link="logit"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))

5. Fixed és random coefficient logistic-binomial model require(hglm) hglm(fixed = Y/N ~ AGE + GENDER , random = ~ 1 | MEGYE, family = binomial(link = logit), rand.family = gaussian(link = "identity"), data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))

6. Fixed and multiple random coefficient Poisson model require(hglm) hglm(fixed = Y ~ offset(LOGN) + AGE + GENDER + AGE:GENDER , random = ~ 1

|MEGYE:GENDER:AGGE, family = poisson(link = log), rand.family = gaussian(link = "identity"), method="REML", data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))

7. Hierarchical Poisson model require(hglm) hglm(fixed = Y ~ offset(LOGN) + AGE + GENDER + AGE:GENDER + REGIO , random = ~ 1 | MEGYE , disp = ~ REGIO , family = poisson(link = log), rand.family = gaussian(link = "identity"), method="REML", data=read.csv2(file="http://web.tatk.elte.hu/~eregr/vmort.csv"))

38

Created by XMLmind XSL-FO Converter .

Download