Poisson Rate, Intro to Matched Pair Data Stat 557 Heike Hofmann

Poisson Rate, Intro to Matched Pair Data Stat 557 Heike Hofmann Final Project • Choose topic for final project • Send email with one paragraph write-up with • explanation of the data • suggestion for analysis • sample of the data (or web-link to data) • In case you are working in a team (1-3 people) send me team information Outline • Poisson Rate • Matched Pair Data Poisson Regression • Event occurrences proportional to observed time, or space, or other index of size • Want: model rate of occurrence • e.g. study of homicides in a given year in a sample of cities: • model rate (= homicides/population size) for a city. • Explanatory factors might be unemployment rate, residents’ median income, percentage of high school graduates, ... Poisson Rates • For size index t, model rate at which events occur: log µi/ti = α + β xi • equivalent to log µi = log ti + α + β xi use offset() to make sure parameter is not estimated, but set to 1 Example: Crimes across US • FBI publishes data on number of crimes by type and state for each year. • For 2009: State Abbr Population Violent.crime Alabama : 1 AK : 1 Min. : 544270 Min. : 817 Alaska : 1 AL : 1 1st Qu.: 1802408 1st Qu.: 5456 Arizona : 1 AR : 1 Median : 4403094 Median : 15968 Arkansas : 1 AZ : 1 Mean : 6128138 Mean : 26207 California: 1 CA : 1 3rd Qu.: 6647091 3rd Qu.: 30481 Colorado : 1 CO : 1 Max. :36961664 Max. :174459 (Other) :44 (Other):44 Murder.and.nonnegligent.manslaughter Forcible.rape Robbery Min. : 7.00 Min. : 124.0 Min. : 77 1st Qu.: 37.75 1st Qu.: 562.8 1st Qu.: 1201 Median : 176.50 Median :1263.5 Median : 3810 Mean : 301.94 Mean :1758.9 Mean : 8077 3rd Qu.: 424.25 3rd Qu.:2080.8 3rd Qu.: 9260 Max. :1972.00 Max. :8713.0 Max. :64093 Aggravated.assault Property.crime Min. : 575 Min. : 12502 1st Qu.: 3610 1st Qu.: 47968 Burglary Min. : 2230 1st Qu.: 9871 Larceny.theft Min. : 9296 1st Qu.: 34424 Example: Crimes across US State Abbr Population Violent.crime Alabama : 1 AK : 1 Min. : 544270 Min. : 817 Alaska : 1 AL : 1 1st Qu.: 1802408 1st Qu.: 5456 Arizona : 1 AR : 1 Median : 4403094 Median : 15968 Arkansas : 1 AZ : 1 Mean : 6128138 Mean : 26207 California: 1 CA : 1 3rd Qu.: 6647091 3rd Qu.: 30481 Colorado : 1 CO : 1 Max. :36961664 Max. :174459 (Other) :44 (Other):44 Murder.and.nonnegligent.manslaughter Forcible.rape Robbery Min. : 7.00 Min. : 124.0 Min. : 77 1st Qu.: 37.75 1st Qu.: 562.8 1st Qu.: 1201 Median : 176.50 Median :1263.5 Median : 3810 Mean : 301.94 Mean :1758.9 Mean : 8077 3rd Qu.: 424.25 3rd Qu.:2080.8 3rd Qu.: 9260 Max. :1972.00 Max. :8713.0 Max. :64093 Aggravated.assault Min. : 575 1st Qu.: 3610 Median :10297 Mean :16069 3rd Qu.:20017 Max. :99681 Motor.vehicle.theft Min. : 448 1st Qu.: 3583 Median : 10136 Mean : 15782 3rd Qu.: 17736 Max. :164021 region Property.crime Min. : 12502 1st Qu.: 47968 Median : 132868 Mean : 185850 3rd Qu.: 226611 Max. :1009614 Region Midwest :12 Northeast: 9 South :16 West :13 Burglary Min. : 2230 1st Qu.: 9871 Median : 29432 Mean : 43909 3rd Qu.: 51821 Max. :240233 Larceny.theft Min. : 9296 1st Qu.: 34424 Median : 89563 Mean :126160 3rd Qu.:153502 Max. :678353 Division Mountain : 8 South Atlantic : 8 West North Central : 7 New England : 6 East North Central : 5 Pacific : 5 (Other) :11 45 Violent.crime/Population * 1e+05 200 lat 40 300 400 35 500 600 30 700 -120 -110 -100 -90 -80 -70 long 45 Murder/Population * 1e+05 2 lat 40 4 6 35 8 10 30 -120 -110 -100 -90 -80 -70 glm(formula = Violent.crime ~ Region + offset(log(Population/1e+05)), family = poisson(link = log), data = crime) Deviance Residuals: Min 1Q Median -118.04 -36.92 -22.78 3Q 23.71 Max 72.60 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 5.955204 0.001969 3023.85 <2e-16 *** RegionNortheast -0.073860 0.002988 -24.72 <2e-16 *** RegionSouth 0.238827 0.002385 100.12 <2e-16 *** RegionWest 0.090777 0.002681 33.86 <2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 109966 Residual deviance: 90772 AIC: 91346 on 49 on 46 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 glm(formula = Murder ~ Region + offset(log(Population/1e+05)), family = poisson(link = log), data = crime) Deviance Residuals: Min 1Q Median -12.599 -4.917 -2.109 3Q 1.142 Max 14.193 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.519367 0.018095 83.965 < 2e-16 *** RegionNortheast -0.179513 0.028305 -6.342 2.27e-10 *** RegionSouth 0.261676 0.021838 11.983 < 2e-16 *** RegionWest -0.008964 0.025219 -0.355 0.722 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1914.1 Residual deviance: 1504.5 AIC: 1850.8 on 49 on 46 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 Example: Heart Attacks • 109 patients: • • • heart valve (aortic/mitral) age (<55, ≥55) survival (in months) Time at risk (in months) Deaths aortic mitral aortic mitral <55 1259 2082 <55 4 1 ≥55 1417 1647 ≥55 7 9 glm(formula = Deaths ~ Valve + Age + offset(log(Exposure)), family = poisson, data = heart) Deviance Residuals: 1 2 3 1.025 -1.197 -0.602 4 0.613 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.3121 0.5066 -12.460 <2e-16 *** ValveMitral -0.3299 0.4382 -0.753 0.4515 Age55+ 1.2209 0.5138 2.376 0.0175 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 10.8405 Residual deviance: 3.2225 AIC: 22.349 on 3 on 1 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 5 Poisson Rates Identity Link • For size index t, model rate at which events occur: µi/ti = α + β xi • equivalent to µi = αti + β xi ti linear model without the intercept. Each predictor is multiplied by the index. Identity Link glm(formula = Deaths ~ I(as.integer(Valve) * Exposure) + I(as.integer(Age) * Exposure) + Exposure - 1, family = poisson(link = identity), data = heart) Deviance Residuals: 1 2 3 0.4550 -0.1812 -0.7494 4 0.5400 Coefficients: Estimate Std. Error z value Pr(>|z|) I(as.integer(Valve) * Exposure) -0.0019354 0.0013158 -1.471 0.14132 I(as.integer(Age) * Exposure) 0.0039663 0.0014399 2.755 0.00588 ** Exposure 0.0004772 0.0032507 0.147 0.88329 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: Inf Residual deviance: 1.0931 AIC: 20.22 on 4 on 1 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 7 Matched Pair Data Example: Approval Ratings Same 1600 subjects asked to rate British Prime Minister (Tony Blair) 2nd Rating 1st Rating Approve Disapprove Approve 794 150 Disapprove 86 570 Matched Pair Data • Observations are taken repeatedly from the same subject, or • Individuals with similar demographics are paired Matched Pair Data 2nd Rating 1st Rating Assumptions Approve Disapprove Approve 794 150 Disapprove 86 570 • Diagonal heavily loaded • Association usually strongly positive (most people don’t change their opinion) • Distinguish between movers & stayers Marginal Homogeneity • Did as many people move from category a as to category a? • H :π o a+ = π+a • For binary response: McNemar: (n21-n12)2/(n12+n21) ~ χ21 > mcnemar.test(matrix(c(794,150,86,570),byrow=T,ncol=2),correct=F) McNemar's Chi-squared test data: matrix(c(794, 150, 86, 570), byrow = T, ncol = 2) McNemar's chi-squared = 17.3559, df = 1, p-value = 3.099e-05 Subject-specific Tables • For binary responses Y ,Y , we can think of 1 2 a record as one of the four instances: stayers 794 yes no 570 yes no 1st 2nd 1 0 0 1 1 0 1st 2nd 0 1 150 yes no 86 yes no 1st 2nd 1 0 0 1 0 1 1st 2nd 1 0 movers Subject-specific Tables • Adding all 1600 of these tables stayers 794 1st 2nd yes yes1st no 944 1 1 2nd 0 0 880 no 570 656 1st 720 2nd yes no 0 1 0 1 Marginal moversHomogeneity then translates to whether probability150 of approval 86 between yes no is the same yes no 1st 1and 2nd 0 1 rating 1st 0 1st 2nd 0 1 2nd 1 0 Marginal Homogeneity Use model to test for conditional independence • P(Y = 1| x ) = α + β x • x is dummy variable for time points t t t t x1 = 0, x2 = 1 • Marginal Homogeneity/Conditional Independence: β=0 glm(formula = Approval ~ Rating, family = binomial(link = logit), data = pm, weights = count) Deviance Residuals: 1 2 3 -31.56 34.20 -32.44 4 33.91 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.52726 0.11340 -4.649 3.33e-06 *** Rating 0.16329 0.07148 2.285 0.0223 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 4373.2 Residual deviance: 4368.0 AIC: 4372 on 3 on 2 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 Marginal Homogeneity Alternatively, use logit link: • logit P(Y = 1| x ) = α + β x • x is dummy variable for time points t t t t x1 = 0, x2 = 1 Then β is log odds ratio based on overall population Subject Specific Model • link P(Y = 1) = α + β x • x is dummy variable for time points it i t t x1 = 0, x2 = 1 • then αi = link P(Yi1 = 1) β = link P(Yi2 = 1) - link P(Yi1 = 1) Marginal vs SubjectSpecific Model Estimates for β • is identical for marginal model and subject specific model in case of identity link • are different for logit link • marginal model: β = logit P(Y2 = 1| x2 ) - logit P(Y1 = 1| x1 ) • subject specific, for all i: β = logit P(Yi2 = 1| x2 ) - logit P(Yi1 = 1| x1 ) Subject-Specific Model • logit P(Y = 1) = α + β x • Assumptions generally: • responses from different subjects it i t independent (for all i) • responses for different time-points independent Subject-Specific Model • Violation of independence taken care of by model structure: • • Generally, |αi| >> |β| • When |αi| is small, we have the most variability between responses of the same individual - i.e. least dependence. That’s the records, on which estimation of β is based on. For large |αi|, probability of P(Yit = 1) is either close to 0 or close to 1 (largest dependence in the data) Fitting the Subject Specific Model • link P(Y = 1) = α + β x • for large i, fitting α becomes problematic: it i i condition out t

Poisson Rate, Intro to Matched Pair Data Stat 557 Heike Hofmann

Related documents

Products

Support

Poisson Rate, Intro to Matched Pair Data Stat 557 Heike Hofmann

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib