Statistics - New York University

advertisement
Part 24 – Statistical Tests:3
Statistics and Data
Analysis
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
Part 24 – Statistical Tests:3
Statistics and Data Analysis
Part 24 – Statistical
Tests: 3
Part 24 – Statistical Tests:3
A Bivariate Latent Class Correlated
Generalised Ordered Probit Model with
an Application to Modelling Observed
Obesity Levels
William Greene
Stern School of Business, New York University
With Mark Harris, Bruce Hollingsworth, Pushkar Maitra
Monash University, Melbourne
Health Econometrics Workshop
December 4-6, 2008
University of Milan - Bicocca
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
700000
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Introduction
Most common measure = Body Mass Index (BMI):
Weight (Kg)/height(Meters)2


WHO guidelines:
BMI < 18.5 are underweight
18.5 < BMI < 25 are normal
25 < BMI < 30 are overweight
BMI > 30 are obese




Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
Around 300 million people worldwide are obese, a figure
likely to rise
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

The International Obesity Taskforce (http://www.iotf.org)
calls obesity one of the most important medical and public
health problems of our time.
Defined as a condition of excess body fat; associated with a
large number of debilitating and life-threatening disorders
Health experts argue that given an individual’s height, their
weight should lie within a certain range
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Costs of Obesity
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Frequency

Listing

In the US more people are obese than the number
who smoke or use illegal drugs
Obesity is a major risk factor for (noncommunicable) diseases like heart problems and
cancer
Obesity is also associated with:
 lower wages and productivity, and absenteeism
 low self-esteem
And is costly to society:
 USA costs are around 4-8% of all annual health
care expenditure - US $100 billion
 Canada, 5%; France, 1.5-2.5%; and New
Zealand 2.5%
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Background
Behavioural adjustments, such as to diet and
increased physical activity, can be made: if perceived
benefits exceed costs

Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

So, it is clearly an enormous public health
issue, worldwide !
This is a growing area of research, but to date
there have been relatively few economic and
econometric analysis of obesity
Frequency

Listing

It has also been argued that obesity is, in part,
an economic phenomenon
Obesity is seen as potentially avoidable:
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
An Ordered Probit Approach
A Latent Regression Model for “True BMI”
BMI* = 1x1 + 2x2 +… + ,
“True BMI” = a proxy for weight is unobserved
Observation Mechanism for Weight Type
WT
= 0 if BMI* < 0
Normal
1 if 0 < BMI* < 
Overweight
2 if BMI* > 
Obese
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
700000
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Data
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

US National Health Interview Survey
(2005); conducted by the National
Centre for Health Statistics
Information on self-reported height and
weight levels, BMI levels
Demographic information
Remove those underweight
Split sample (30,000+) by gender
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
BMI Ordered Choice Model

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Here we use, conditional on class membership, lifestyle factors
Marriage comfort factor.only for .normal. class women
Both classes negatively associated with income, education and
Exercise effects similar in magnitude except for exercise
Exercise intensity only important for ‘non-normal’ class:
Home ownership only important for .non-normal.class, and
negative: result of differing soci-economic status distributions
across classes?
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Males

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Marriage comfort factor for both classes; higher for normal class
Income positively associated with weight levels for normal class
Home own: unlike females, only normal class affected and
positively
Education negatively associated, and of a similar magnitude
Exercise - as with females negatively associated with weight: but
now more pronounced in non-normal class
Vigorous exercise only important for .non-normal. class
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Effects of Aging on Weight Class
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
700000
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Effect of Education
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
700000
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 24 – Statistical Tests:3
Effect of Income
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
900000
Mean
StDev
N
AD
P-Value
95
700000
90
500000
400000
200000
100000
15000
800000
700000
60
50
40
30
20000
22500
25000
IncomePC
27500
30000
32500
e  mc  
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
17500
20000
22500
25000
IncomePC
27500
Mean
StDev
N
369687
156865
51
80
8
4
200000
Normal
10
300000
0
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
500000
400000
10
17500
Histogram of Listing
14
2
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
300000
100000
Probability Plot of Listing
99
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Download