ANALYSIS AND COMPARISON OF BLOOD LEAD RISK IN INDIANA

advertisement
ANALYSIS AND COMPARISON OF BLOOD LEAD RISK
AREA MODELS FOR SELECTED URBAN AREAS
IN INDIANA
A THESIS
SUBMITTED TO THE GRADUATE SCHOOL
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE
MASTER OF SCIENCE
BY
YUNZHONG ZHAO
(ADVISOR: DR. KEVIN TURCOTTE)
BALL STATE UNIVERSITY
MUNCIE, INDIANA
JULY 2009
Acknowledgements
I would like to express my thanks to my advisor Dr. Turcotte for his help and
encouragement with writing this analysis. I also would like to thank committee
members Dr. Airriess and Dr. Yang for their guidance to improve this thesis. Thanks
to all professors in the Geography Department at Ball State University for their help
during my two years in the department. Thanks also to my classmates and friends
Bernard, Andrea, Matt, and Michael for providing help and happiness during my
study. Last but not least, I would like to thank my parents for their constant love and
encouragement throughout my life.
This is dedicated to my parents, for their love and support through my life.
ii
Acknowledgements .................................................................................................................... ii
LIST OF TABLES ..................................................................................................................... v
LIST OF FIGURES ................................................................................................................viii
1.
2.
3.
INTRODUCTION .............................................................................................................. 1
1.1
Overview ............................................................................................................. 1
1.2
Lead Problem in United States ............................................................................ 2
1.3
Progress in Lead Prevention Research ................................................................ 3
1.4
Significance of the Study .................................................................................... 4
1.5
Research Objective ............................................................................................. 4
LITERATURE REVIEW ................................................................................................... 5
2.1
Factors Associated with Lead Risk ..................................................................... 5
2.2
Current Models of Lead Risk .............................................................................. 7
2.2.1.
Geographic Resolution ............................................................................... 7
2.2.2.
Analysis Methods....................................................................................... 9
2.2.3.
Extended Analysis of Current Models ..................................................... 11
DATA, METHODOLOGY, PROCEDURES ................................................................... 12
3.1
Data ................................................................................................................... 12
3.1.1.
Study Area ................................................................................................ 12
3.1.2.
Description of the Data ............................................................................ 12
3.1.3.
Preparation of the Data ............................................................................ 13
3.1.3.1.
US Census SF1 and SF3 Data .......................................................... 13
3.1.3.2.
Preparation of Children’s BLLs Data ............................................... 16
3.1.3.3.
Preparation of GIS Data ................................................................... 16
3.1.4.
3.2
Software ................................................................................................... 17
Methodology ..................................................................................................... 17
3.2.1.
Selection of Geographic Resolution ........................................................ 17
3.2.2.
Selection of Urban Areas ......................................................................... 17
iii
3.2.3.
4.
Model Building ........................................................................................ 21
3.2.3.1.
Selection of Independent and Dependent Variables ......................... 21
3.2.3.2.
Least Squares Regression ................................................................. 22
3.2.3.3.
Testing the Models............................................................................ 23
RESULTS ......................................................................................................................... 24
4.1
Childhood Lead Screening in Indiana through Five Years................................ 24
4.2
Children BLLs in Selected Urban Areas ........................................................... 25
4.3
Independent Variables ....................................................................................... 36
4.4
Test the Model with Different Screened Data ................................................... 37
4.5
Model of Indiana ............................................................................................... 38
4.6
Models of Selected Urban Areas ....................................................................... 40
4.6.1.
Muncie ..................................................................................................... 40
4.6.2.
Evansville ................................................................................................. 43
4.6.3.
Indianapolis .............................................................................................. 45
4.6.4.
Elkhart and Goshen .................................................................................. 47
4.6.5.
South Bend and Mishawaka ..................................................................... 50
4.6.6.
Fort Wayne ............................................................................................... 52
4.6.7.
Northern Lake County ............................................................................. 54
4.6.8.
Clarksville, New Albany and Jeffersonville............................................. 57
4.6.9.
Comparison of Models Based on City Size ............................................. 59
4.6.10.
Comparison of Models Based on Location .............................................. 61
4.6.11.
Comparison of Models Based on Accuracy ............................................. 62
5.
SUMMARY AND DISCUSSION .................................................................................... 66
6.
REFERENCES ................................................................................................................. 70
APPENDIX A .......................................................................................................................... 72
iv
LIST OF TABLES
Table
Page
Table 3.1 Calculation the Socio-Economic Information …………………..………………...14
Table 3.2 Population and location of urban areas in Indiana………………………………...18
Table 4.1 Comparison of Model Parameters for Large Urban Areas………………………...63
Table 4.2 Comparison of Model Parameters for Small Urban Areas………………………...63
Table 4.3 Comparison of Model Parameters for Northern Indiana…………………………..64
Table 4.4 Comparison of Model Parameters for Central Indiana…………………………….64
Table 4.5 Comparison of Model Parameters for Southern Indiana…………………………..64
Table 4.6 Comparison of Models of different urban areas with the Model of
Indiana……………………………………………………………………………...65
Table 7.1 Test the Normal Distribution of residuals of dependent variable
in State level Model………………………………..................................................72
Table 7.2 Result of using the stepwise for selected tracts in Indiana……………...................73
Table 7.3 Result of using the backward elimination method for selected
tracts in Indiana…………………………………………………………………….74
Table 7.4 Coefficient of Indiana Mode……………………………………………………….75
Table 7.5 Test the Normal Distribution of residual of dependent variable
in Muncie Model……………………………………………………………….... .77
Table 7.6 Result of using the stepwise method for selected tracts in Muncie ……………….77
Table 7.7 Coefficients of Muncie Model using stepwise method…………………………….77
Table 7.8 Result of using the backward elimination method for selected tracts in Muncie …78
Table 7.9 Coefficients of Muncie Model using the backward elimination method…………..79
Table 7.10 Test the Normal Distribution of residual of dependent variable
in Evansville Model………………………………………………………………81
v
Table 7.11 Result of using the stepwise for selected tracts in Evansville…………………….81
Table 7.12 Coefficients of Evansville Model using stepwise method ……………………….81
Table 7.13 Result of using the backward elimination method for selected tracts
in Evansville………………………………………………………………………82
Table 7.14 Coefficients of Evansville Model using backward elimination method………….83
Table 7.15 Test the Normal Distribution of residuals of dependent variable
in Indianapolis Model…………………………………………………………….85
Table 7.16 Result of using the stepwise for selected tracts in Indianapolis………………….86
Table 7.17 Coefficients of Indianapolis Model using stepwise method……………………..86
Table 7.18 Result of using the backward elimination method for selected tracts
in Indianapolis………………………………………………………………….. 87
Table 7.19 Coefficients of Indianapolis Model using backward elimination method……….88
Table 7.20 Test the Normal Distribution of residual of dependent variable
in Elkhart and Goshen Model…………………………………………………….90
Table 7.21 Result of using the stepwise for selected tracts in Elkhart and Goshen………….90
Table 7.22 Coefficients of Elkhart and Goshen Model using stepwise method……………...90
Table 7.23 Test the Normal Distribution of residual of dependent variable
in South Bend and Mishawaka Model…………………………………………....91
Table 7.24 Result of using the stepwise for selected tracts
in South Bend and Mishawaka…………………………………………………...91
Table 7.25 Result of using the backward elimination method for selected tracts
in South Bend and Mishawaka…………………………………………………...92
Table 7.26 Coefficients of South Bend and Mishawaka using
backward elimination method…………………………………………………….93
Table 7.27 Test the Normal Distribution of residual of dependent variable
in Fort Wayne……………………………………………………………………..94
Table 7.28 Result of using the stepwise for selected tracts in Fort Wayne…………………...94
vi
Table 7.29 Coefficients of Fort Wayne using backward elimination method………………...94
Table 7.30 Result of using the backward elimination method for selected tracts
in Fort Wayne……………………………………………………………………..95
Table 7.31 Coefficients of Fort Wayne Model using backward elimination method………...96
Table 7.32 Test the Normal Distribution of residual of dependent variable
in Northern Lake County Model………………………………………………… 98
Table 7.33 Result of using the stepwise for selected tracts in Northern Lake County……….98
Table 7.34 Coefficients of Northern Lake County Model using stepwise method…………...98
Table 7.35 Result of using the backward elimination method for selected tracts
in Northern Lake County…………………………………………………………99
Table 7.36 Coefficients of Northern Lake County Model using
backward elimination method…………………………………………………...100
Table 7.37 Test the Normal Distribution of residual of dependent variable
in Clarksville, New Albany and Jeffersonville Model…………………………102
Table 7.38 Result of using the stepwise for selected tracts
in Clarksville, New Albany and Jeffersonville………………………………….103
Table 7.39 Coefficients of Clarksville, New Albany and Jeffersonville Model
using stepwise method…………………………………………………………..103
Table 7.40 Result of using the backward elimination method for selected tracts
in Clarksville, New Albany and Jeffersonville………………………………...104
Table 7.41 Coefficients of Clarksville, New Albany and Jeffersonville Model
using backward elimination method…………………………………………….105
vii
LIST OF FIGURES
Figures
Page
Figure 3.1 Urban Areas in Indiana……………………………………………………………13
Figure 3.2 Preparation of US Census SF1 and SF3 data……………………………………..15
Figure 3.3 Preparation of Children’s Blood Lead Levels (BLLs) data……………………….16
Figure 3.4 Procedure for selecting urban areas……………………………………………….19
Figure 3.5 Selected urban areas in Indiana…………………………………………………...20
Figure 3.6 Least squares regression Line……………………………………………………..22
Figure 4.1 Number of Screened and EBLLs Children in Indiana from 1998 to 2002……….24
Figure 4.2 Percentage of Children Screened with EBLLs in Indiana
from 1998 to 2002 in Indiana……………………………………………………..25
Figure 4.3 Percentages of Children with EBLLs in Muncie from 1998 to 2002
by census tract…………………………………………………………………….26
Figure 4.4 Percentages of Children with EBLLs in Evansville from 1998 to 2002
by census tract…………………………………………………………………….27
Figure 4.5 Percentages of Children with EBLLs in Indianapolis from 1998 to 2002
by census tract…………………………………………………………………….28
Figure 4.6 Percentages of Children with EBLLs in Elkhart and Goshen from 1998 to 2002
by census tract…………………………………………………………………….29
Figure 4.7 Percentages of Children with EBLLs in South Bend and Mishawaka
from 1998 to 2002 by census tract………………………………………………..30
Figure 4.8 Percentages of Children with EBLLs in Fort Wayne from 1998 to 2002
by census tract…………………………………………………………………….31
Figure 4.9 Percentages of Children with EBLLs in Northern Lake County in
from 1998 to 2002 by census tract…………………………………………………32
Figure 4.10 Percentages of Children with EBLLs in Clarksville, New Albany,
viii
and Jeffersonville from 1998 to 2002 by census tract…………………………..33
Figure 4.11 Number of Screened and EBLLs Children in Muncie from 1998 to 2002………34
Figure 4.12 Number of Screened and EBLLs Children in Evansville from 1998 to 2002…...34
Figure 4.13 Number of Screened and EBLLs Children in Indianapolis from 1998 to 2002…34
Figure 4.14 Number of Screened and EBLLs Children in Elkhart, Goshen
from 1998 to 2002…………………………………………………………….…35
Figure 4.15 Number of Screened and EBLLs Children in South Bend and
Mishawaka from 1998 to 2002………………………………………………….35
Figure 4.16 Number of Screened and EBLLs Children in Fort Wayne from 1998 to 2002….35
Figure 4.17 Number of Screened and EBLLs Children in Northern Lake County
from 1998 to 2002……………………………………………………………….36
Figure 4.18: Number of Screened and EBLLs Children in Clarksville,
New Albany and Jeffersonville from 1998 to 2002……………………………..36
Figure4.19 Selected Census Tracts in Indiana………………………………………………..39
Figure 4.20 Test Homoscedasticity of State Model…………………………………………..39
Figure 4.21 Residuals by Census Tract Selected Areas in Indiana…………………………...40
Figure4.22 Selected Census Tracts in Muncie………………………………………………..41
Figure 4.23 Test Homoscedasticity of Muncie Model………………………………………..41
Figure 4.24 Residuals by Census Tract Selected Areas in Muncie…………………………...42
Figure4.25 Selected Census Tracts in Evansville…………………………………………….43
Figure 4.26 Test Homoscedasticity of Evansville Model…………………………………….44
Figure 4.27 Residuals by Census Tract Selected Areas in Evansville………………………..44
Figure 4.28 Selected Census Tracts in Indianapolis………………………………………….46
Figure 4.29 Test Homoscedasticity of Indianapolis Model…………………………………..46
Figure 4.30 Residuals by Census Tract Selected Areas in Indianapolis……………………...47
Figure 4.31 Selected Census Tracts in Elkhart and Goshen………………………………….48
Figure 4.32 Test Homoscedasticity of Elkhart and Goshen Model…………………………..48
ix
Figure 4.33 Residuals by Census Tract Selected Areas in Elkhart and Goshen……………...49
Figure 4.34 Selected Census Tracts in South Bend and Mishawaka………………………....50
Figure 4.35 Test Homoscedasticity of South Bend and Mishawaka Model………………….51
Figure 4.36 Residuals by Census Tract Selected Areas in South Bend and Mishawaka……..51
Figure 4.37 Selected Census Tracts in Fort Wayne…………………………………………..53
Figure 4.38 Test Homoscedasticity of Fort Wayne Model……………………………………53
Figure 4.39 Residuals by Census Tract Selected Areas in Fort Wayne ……………………...54
Figure 4.40 Selected Census Tracts in Northern Lake County……………………………….55
Figure 4.41 Test Homoscedasticity of Northern Lake County Model………………………..56
Figure 4.42 Residuals by Census Tract Selected Areas in Northern Lake County…………...56
Figure 4.43 Selected Census Tracts in Clarksville, New Albany and Jeffersonville…………58
Figure 4.44 Test Homoscedasticity of Clarksville, New Albany and Jeffersonville Model….58
Figure 4.45 Residuals by Census Tract Selected Areas in Clarksville,
New Albany and Jeffersonville………………………………………………….59
Figure 7.1 Histogram of residual of dependent variable in Indiana State…………………...72
Figure 7.2 Distribution of dependent variable to each independent variable in Indiana. …...76
Figure 7.3 Histogram of dependent variable in Muncie…………………………………......77
Figure 7.4 Distribution of dependent variable according to each independent variable in
Muncie…………………………………………………………………………....80
Figure 7.5 Histogram of residual of dependent variable in Evansville……………..……….81
Figure 7.6 Distribution of dependent variable according to each independent variable in
Evansville………………………..………………………………………..………84
Figure 7.7 Histogram of residual of dependent variable in Indianapolis……………….…….85
Figure 7.8 Distribution of dependent variable according to each independent variable in
Indianapolis………………………………………………………………..…...…89
Figure 7.9 Histogram of residual of dependent variable in Elkhart and Goshen…………….90
Figure 7.10 Histogram of residual of dependent variable in South Bend and Mishawaka…..91
x
Figure7.11 Distribution of dependent variable according to each independent variable in
South Bend and Mishawaka……………………………………..………………93
Figure 7.12 Histogram of residual of dependent variable in Fort Wayne…………………….94
Figure7.13 Distribution of dependent variable according to each independent variable in
Fort Wayne……………………………………………………..…………….….97
Figure 7.14 Histogram of residual of dependent variable in Northern Lake County………...98
Figure 7.15 Distribution of dependent variable according to each independent variable in
Northern Lake County…………………………………….……….…………..101
Figure 7.16 Histogram of residual of dependent variable in Clarksville,
New Albany and Jeffersonville………………………………………………...102
Figure7.17 Distribution of dependent variable according to each independent variable in
Clarksville, New Albany and Jeffersonville……………………….…………..106
xi
1. INTRODUCTION
1.1 Overview
Exposure to lead is significant to human because it will damage the central
nervous system and impair learning and behavior even at low levels (Canfields et al.
2003). Lead is contained in many items that relate to human life. For example, lead
may exist in older houses that contain layers of lead paint. Soils could be another
source of lead as a result of historical deposition from automobile exhaust, lead
arsenate pesticide and industrial or incinerator emission (CDC, 1997). Lead may also
come from ceramic ware or through occupational hazards such as lead mining. Young
children are more vulnerable to lead than adults for two reasons. First, they may
ingest contaminated dust and soil as a result of normal mouthing activity. Second,
they take in more lead as a proportion of body mass and absorb more lead than do
adults (Mushak, 1992). The excessive absorption of lead may cause many problems
for young children, such as learning and behavioral disorders, decreased mental
ability and intelligence quotient (IQ), hearing impairment, delayed development,
decreased attention span, delinquency and criminal behavior and other nervous
system problems (Oyana and Margai, 2007). Lead even causes coma, convulsion, and
death in children with very high levels (Rappazzo et al., 2007).
2
1.2 Lead Problem in United States
In the United States, the cost of health effects associated with lead exposure
is estimated to be $43.4 billion each year, which is much more than costs of other
childhood diseases of environmental origin (Landrigan et al., 2002). According to the
2003-2004 National Health and Nutrition Examination Survey (NHANES), the Blood
Lead Levels (BLLs) at or above the Centers for Disease Control and Prevention (CDC)
blood action level of 10 ug/dl for children between one and five years old in the
United States is 2.3%, which is greater than 500,000 children (Kim et al. , 2008). The
federal government’s response to the lead problem was to lower the recommendation
screening BLLs from 60ug/dl in the 1960s to 40 ug/dl in 1971, 30 ug/dl in 1978, 25
ug/dl in 1985, and finally to 10 ug/dl in 1991 (Oyana and Margai, 2007). In addition,
the 1978 enactment of the Lead-Based Paint Poisoning Prevention Act banned the use
of lead-based paint nationally. However, there are still a large number of houses built
before 1978 and many of them are even built before 1950. Lead would be present
when paint deteriorates or spreads during renovation. Historical deposition from
automobile exhaust and factories that emit lead would still affect the children living in
adjacent areas.
According to the Indiana State Department of Health (2004), the most
common source of lead exposure for children was related to lead-based paint in older
houses, especially in houses built prior to 1950. Indiana has 71,711 houses built prior
to 1950, ranking 11th in the nation. Indiana’s proportion of housing built prior to 1950
is 28.3%, which is higher than the national average of 22.3%. The children who live
in these old houses have a high possibility of being effected by the lead in and around
these houses.
3
1.3 Progress in Lead Prevention Research
In order to reduce the elevated BLLs in children, the CDC provided grants to
state and local agencies for screening children for blood lead levels. For example, in
2003, the CDC provided $31.7 million to 42 states and local health departments to
develop and implement comprehensive lead poisoning prevention efforts. Margai
(1998) claimed that even though the CDC recommended universal screening, not all
children were being tested. As a result, both neighborhoods and geographic clusters of
highly exposed individuals would not be screened well. Margai also pointed out the
necessity to build an efficient lead monitoring and prevention plan.
GIS can be used to map and evaluate childhood lead risk areas. According to
recent studies in New York and North Carolina, the level of lead risk was linked with
the age of housing, house value, percentage of renter occupied houses, percentage of
children in poverty, percentage of one-parent households, household median income,
percentage of African American, and percentage of Hispanic. Several statistical
models have been built based on these variables. In addition, some research focused
on how to map lead exposure risk areas based on different resolution levels. For
example, some researchers used census block, census tract, or tax parcel as a basic
resolution to create GIS models for directing childhood lead poisoning prevention.
However, among these studies, only a few focused on the geographic factors in these
models.
4
1.4 Significance of the Study
In this study, the key is to determine which parameters based on selected
urban areas, are the best predictors of elevated BLLs (blood lead levels) in children.
Differences in model parameters will be examined for different city size and
geographical location. In order to compare these models, it is necessary to find a way
to build the models based on the same procedure and standard.
1.5 Research Objective
The primary purpose of this study is to identify the effects of size and
location of different urban areas on the different parameters in the statistical models.
The specific objectives of this thesis are:
1) To describe the elevated BLL changes in Indiana from 1998 to 2002.
2) To standardize a method to create statistical models to predict census tracts
with high percentages of elevated BLLs.
3) To compare and examine the models from urban areas of different size and
location based on the parameters of the generated models.
4) To compare the urban areas generated models to a state level model.
5) To create residual maps based on the difference between the value generated
by models and the actual screened values.
2. LITERATURE REVIEW
2.1 Factors Associated with Lead Risk
Cromley and McLafferty (2002, p. 9) claimed that “if the cause of the disease
is believed to be environmental, then we would expect disease risk to be higher in
those geographical areas where environmental risk is higher.” The level of lead risk
has been linked with socio-economic factors and age of housing stock. Talbot et al.
(1998) linked high prevalence of elevated BLLs with areas of older housing stock, a
smaller proportion of high school graduates, and a larger proportion of black births in
their research of the childhood blood lead levels in New York State. An extended
research of blood lead levels in New York State children born between 1994 and 1997
found the same relationship (Haley and Talbot, 2004). Dwyer (1998) used the age of
housing stock, land use and road distance to evaluate risk areas in Australia.
Bruenling et al. (1999) tested the dietary calcium intake of urban children and claimed
that both lead exposure and low dietary calcium pose significant health risks to urban
minority children.
Because the demolition of old houses could be a source of lead, researchers
studied dust caused by demolition. Farfel et al. (2003) studied the dust-fall samples
collected from fixed locations within ten meters of three demolition sites to describe
lead dust changes in the surrounding environment. In addition, lead dust would also
spread through remedial or removal activities at superfund sites. Khoury and
6
Diamond (2003) studied modeling approaches for assessing potential risk to children
from air lead emissions from remedial or removal activities at superfund sites in West
Dallas, Texas. They used the Environmental Protection Agency (EPA) Integrated
Exposure Uptake Biokinetic (IEUBK) model and the International Commission of
Radiologic Protection (ICRP) lead model to simulate blood lead concentrations in
children. These studies examined the source of lead in demolition and superfund sites
and revealed the variables that affected the residences surrounding these sites.
Other researchers have studied the seasonality of children’s blood lead levels.
Laidlaw et al. (2005) explored the temporal relationship between children’s blood lead
levels with weather, soil moisture and dust in Indianapolis, Indiana, Syracuse, New
York, and New Orleans, Louisiana. In this research, the average children’s blood Pb
(BPb) concentration in each city were computed using the children’s BPb
measurements for each month as a variable regressed against the independent
variables of average monthly soil moisture, particulate matter < 10pm in diameter,
wind speed and temperature. The results showed that the seasonal resuspension of
Pb-contaminated soil in urban atmospheres is controlled by soil moisture and climate
fluctuations. Higher urban atmospheric Pb loading rates are present during the period
of low soil moisture and within areas of Pb-contaminated surface soils.
Researchers have also focused on the changes in children’s blood lead levels
related to compliant housing. Rappazzo et al. (2007) found that blood lead level
changes were not significantly different between children in compliant housing and
those living in noncompliant housing for periods of 1.5 to 2 years, 2 to 3 years, or
more than 3 years in Philadelphia. This study also pointed out that many factors might
influence blood lead levels, including the age of a subject, gender, season, the time of
the test, diet of the subject immediately before the test, and the possible presence of
7
lead on a subject’ skin. The blood lead results are not a reflection of total body burden
of lead considering lead’s half-life of the 30 to 60 days in the blood.
2.2 Current Models of Lead Risk
For the reason that screening databases can be highly biased in representing
the geographical distribution of a health problems (Cromley and McLafferty, 2002),
choosing screening areas becomes important in building models of lead risk to
children. Some factors that affect results include the geographic resolution and the
choice of the regression methods.
2.2.1. Geographic Resolution
Geographic resolution can affect the model results in different ways. Some
geographic resolutions such as census blocks or tax parcels could increase the
geographic accuracy but affect the significance of the model; large geographic
resolutions such as census tract or zip code areas are crude for geographic accuracy,
but contribute to the significance of the model. Many researchers used different
geographic resolutions such as zip code areas, census tracts, or tax parcels. There is
no standard to judge which resolution is the best because of different methods of data
collection, procedures, and accessibility.
Zip code areas or merged zip code regions are chosen in many studies for the
reason that most data from health departments are geocoded at the zip code level.
8
Talbot et al. (1998) used merged zip code regions as the units of observation in their
research of children blood lead levels in New York. Haley and Talbot (2004) extended
their research again using merged zip code regions. Using zip code areas or merged
zip code regions could minimize the error created through the transformation to other
resolutions such as census blocks or census tracts, the boundaries of which are not
completely matched with the boundaries of zip code areas.
However, compared with the zip code areas, census tracts would be more
sensitive in pinpointing a greater number of older housing units for the reason that
socio-economic factors are more similar within a census tract than within the larger
zip code area. Reissman et al. (2001) achieved these conclusions in their analysis of
childhood BPb levels and residential locations of at-risk children screened from 1996
through 1997, the number and location of homes where more than one child had been
poisoned by lead from 1994 through 1998 in Jefferson County, Kentucky.
Griffith et al. (1998) made a comparison between census blocks, census
block groups, and census tracts in research of childhood blood lead levels in Syracuse,
New York. The results indicated that census tracts and census block groups appear as
suitable resolutions to build sound, model-based statistical inferences. However, the
census block level of aggregation is too sparse to achieve satisfactory statistical
models.
The tax parcel unit is also used as the geographic resolution in some studies.
Miranda et al. (2002) used tax parcels to study the potential lead risk for children in
North Carolina. Kim et al. (2008) built three childhood lead exposure risk models
based on tax parcels. They claimed that “the highly resolved models allow
communities to target the highest-risk homes more cost-effectively and to create and
implement targeted intervention programs”(p.1735).
9
Increased resolution could be achieved at the point level which locates each
of the screening locations. An example is the Robert et al. (2003) study of old housing
and lead screening in Charleston County, South Carolina. They found that children
living in pre-1950 housing were at higher risk for lead poisoning and a large number
of cases in an area of newer houses, but near a potential point source of lead.
2.2.2. Analysis Methods
In recent studies of children with elevated BLLs, different regression
methods were used for building lead risk assessment models. One of the regression
methods is least squares regression. Talbot et al. (1998) used this method to examine
children’s BLLs and community characteristics. They used percent of houses built
before 1940, percent of houses built before 1950, percent of houses vacant, adults age
25 and older who graduated from high school, percent of children under 5 years living
below the poverty level, percent of Hispanic, percent of Black, percent of population
that rents a home, and population density as variables. The log-transformed
percentage of children with elevated BLLs in each zip code was chosen as the
dependent variable. In addition, all zip codes with fewer than 100 children tested were
merged to create new zip code regions. The analysis process included shaping and
effecting the bivariate associations of each variable with the dependent variable using
diagnostic methods to detect the model errors. Haley and Talbot (2004) also extended
the research by building a simultaneous autoregressive model. Another example is the
research of childhood lead poisoning in North Carolina. Miranda, et al. (2002) used
log-linear regression to generate models for six counties in North Carolina. They used
10
ANOVA to drop three variables, thereby using a total of six predictors. The dropped
variables were: percentage of children in poverty, percentage of one-parent
households, and percentage of renter-occupied housing.
Logistic regression methods are also used in building models for evaluating
childhood blood lead risk areas. Oyana and Margai (2007) used logistic regression
models to approximate the risks of childhood lead poisoning in six neighborhoods in
Chicago. The Northside neighborhood of Chicago was used as a reference area in the
procedure and the result showed that the Westside neighborhoods faced the greatest
risk of lead poisoning.
Researchers used not only statistical analysis, but spatial and geostatistical
methods have also been employed. An example is Margai (1998), who used GIS to
generate lead case clusters, buffers of environmental sources, including factories and
other facilities related with lead, buffers for automobile-related facilities such as gas
storage and buffers of environmental pathways such as rail corridors in Binghamton,
New York. Spatial analysis found that nearly six out of ten lead case clusters were
within these buffers. In addition, a combination of geographic factors and
demographic variables were made using canonical coefficients to generate high-risk
areas for lead poisoning. Oyana and Margai (2007) used kriging to predict unknown
values from observed data at known locations using a fitted semivariogram model.
The kriged maps provided a smooth surface of locations with high prevalence rates
and clearly captured the overall pattern of decline spatially. Haley and Talbot (2004)
used a spatial error model to test four different weight matrices: first-order neighbors,
second-order neighbors, inverse distance to 25 km, and inverse distance squared to 25
km.
11
2.2.3. Extended Analysis of Current Models
Current researchers have created models that focused on areas such as an
entire state, certain counties or cities. For example, Talbot et al. (1998) generated a
model based on New York State, Miranda et al. (2002) generated models base on six
selected counties in North Carolina, and Griffith et al. (1998) generated models based
on Syracuse, New York. No researchers linked the city or county models with a state
model or related models created from different urban areas. Based on this situation, an
extended analysis of current models would test if a state level model could be suitable
for all the urban areas or if a generated model for one urban area could be used for
another. And another extension would lie in examining the differences among the
models based on the different geographic locations and size of urban areas.
3. DATA, METHODOLOGY, PROCEDURES
3.1 Data
3.1.1. Study Area
The primary study area of this thesis is in Indiana. Indiana is located in the
northwest region. It borders Illinois in the west, Ohio in the east, Kentucky in the
south, and Michigan in the North. There are thirteen urban areas in Indiana (Figure
3.1).According to the 2000 U.S. Census, the population of Indiana is 6,080,485 and
the number of children under six-year-old is 595,896, which is about 9.8 percent of
the total population.
3.1.2. Description of the Data
The data for this thesis is composed of three parts: the first part are the
socio-economic data that is sourced from the 2000 U.S. Census Summary File 1 (SF1,
100-Percent Data), and Summary File 3(SF3), which is based on sample data. The
second part is children’s BLLs information from 1998 to 2002. These data are from
the Indiana State Department of Health and are aggregated by census tracts. The third
part of the data is a digital map of Indiana, which comes from the Indiana Geological
13
Survey.
Figure 3.1: Urban Areas in Indiana
3.1.3. Preparation of the Data
3.1.3.1. US Census SF1 and SF3 Data
US census data SF1 contains most of the population information, but lacks
details such as the age of buildings; however, US census data SF3 provides this
information. Social, economic and housing information was extracted from both SF1
and SF3 using the software DataFerrett with 2000 US Census data. Both SF1 and
SF3 possesses census tract data. The census tract number in SF1 was used as a
foreign key, and census tract numbers from SF3 was used as the primary key. The
function of Join Table was used to link these two tables together in ArcGIS and a
new table that contained information from both was generated. Based on the new
14
table, some parameters were calculated. Table 3.1 and Figure 3.2 show this
procedure.
Table 3.1 Calculation the Socio-Economic Information
Dividend
Divisor
Results
Total Black Population
Total Population
The Ratio of Black
Total Hispanic Population
Total Population
The Ratio of Hispanic
Rental Housing
Total Housing
The Ratio of Rental Housing
Vacant Housing
Total Housing
The Ratio of Vacant Housing
Total Population of High School
Total Housing
The Ratio of High School or above
or above Education
Housing Built before 1950
Education
Total Housing
The Ratio of Housing Built before
1950
Housing Built before 1980
Total Housing
The Ratio of Housing Built before
1980
One Parent Families
Total Families
The Ratio of One Parent Family
Families in Poverty having
Total Families
The Ratio of Children under Five
Children under Five-Years-Old
Families in Poverty having
Children under
Eighteen-Years-Old
Years Living below the Poverty Level
Total Families
The Ratio of Children under Eighteen
Years Living below the Poverty Level
15
Ratio of Black
Total Black
Ratio of Hispanic
US Census SF1
Total Hispanic
Total Population
US Census SF3
Ratio of Rental
Housing
Ratio of Vacant
Housing
Total Housing
Ratio of High School
or above Education
Total Families
Ratio of Housing
built before 1950
Total Families in Poverty
Ratio of Housing
built before 1980
Rental Housing
Ratio of One Parent
Family
Vacant Housing
Age of Housing
Ratio of Children
under Five Years
Living below the
Poverty Level
Poverty
Ratio of Children
under Eighteen Years
Living below the
Poverty Level
Education
Average Household
Size
One Parent Family
Average Family Size
Average Household
Size
Average Family Size
Figure 3.2: Preparation of US Census SF1 and SF3 Data
16
3.1.3.2.
Preparation of Children’s BLLs Data
The children’s BLLs data for Indiana is collected by the Indiana State
Department of Health. The data includes screening information from 1998 to 2002 for
each census tract. The geocoded procedure includes two steps: first, the address
record is geocoded to an interpolated street range, and second, if the address record is
not matched to a street, then it is mapped to a zip code centroid. For each record, it
contains the number of screened children and the number of children with BLLs of
10µg/dl or higher for each tract between 1998 and 2002. Figure 3.3 illustrates the
procedure for children’s BLLs data.
US Census SF3
Children BLLs
Information
Total Children under Six
Percentage of
Screened
Num of Screened Children
Num of BLLs >=10ug/dl
Percentage of elevated
blood
lead
levels
(EBLLs)
Figure 3.3: Preparation of Children’s Blood Lead Levels (BLLs) data
3.1.3.3.
Preparation of GIS Data
The GIS data was obtained from the Indiana Geological Survey. The data
included US census tracts and urban areas in Indiana. A personal GeoDatabase was
created using ArcCatalog. GIS data, related social, economic, and housing
information, and Children’s BLLs were imported into the GeoDatabase.
17
3.1.4. Software
The software used in this research includes ArcGIS9.3, SPSS, DataFerrett,
Microsoft Excel and Access. DataFerrett was used for searching and extracting 2000
US Census data. ArcGIS9.3 was used to prepare the GeoDatabase and create maps.
Microsoft Excel was used to create charts. Microsoft Access was used to extract field
of parameters for analysis. SPSS was used to generate and test the regression models.
3.2 Methodology
3.2.1. Selection of Geographic Resolution
Previous researchers have used the geographic resolution of zip code areas,
census tracts, and tax parcels. In this research, the census tract is chosen as the
geographic resolution for this research because socio-economic factors are more
similar within a census tract than within the larger area of a zip code (Reissman,
2001), and the purpose of this research is to build models by linking the
socio-economic factors and age of housing stock with children’s elevated BLLs,
3.2.2. Selection of Urban Areas
According to the 2000 US Census, there are thirteen major urban areas in
Indiana. Selection of the study areas is based on three factors: population in urban
areas, location of urban areas and number of children screened in urban areas. The
population and locations are listed in Table 3.2.
18
Table 3.2: Population and location of urban areas in Indiana
Name
Location
Population
Anderson
Central
59,734
Bloomington
Central
69,291
Clarksville-New Albany-Jeffersonville
Southern
113,588
Elkhart-Goshen
Northern
81,257
Evansville
Southern
121,582
Fort Wayne
Northern
244,296
Indianapolis
Central
Kokomo
Central
Lafayette-West Lafayette
Central
Muncie
Central
South Bend-Mishawaka
Northern
Northern Lake County
Northern
468,335
Terre Haute
Central
59,614
791,926
46,113
85,175
67,430
219,361
Because the number of children screened was not uniform, the ratio of
children screened was calculated by using the number of children screened divided
by the number of children five years old and under for each census tract in each year.
All the census tracts with the ratio of children screened less than two percent in any
of the five years were filtered. For example, if there were one hundred children under
five years old in a census tract, and the number of children screened in 2000 was one,
the census tract would be deleted for this study. In order to provide sufficient data for
a census tract with zero EBLL children in all five years, a higher filter was set for the
ratio of children screened. Only those census tracts with the ratio of children
19
screened equal or above five percent were selected for analysis. That is, if there were
one hundred children, the number of children screened was four in 1999 and the
number of EBLL children in each year was zero, the census tract would be deleted
for this study. The procedure for selecting urban areas is shown in Figure 3.4.
Census Tract
Urban Areas
Select by Location
Urban Tract Areas
Screen Ratio Table
Joined and Filter ratio of children
screened < 2% Areas
Filter the areas with the number EBLLs children
(pb >=10ug/dL) equal to 0 through five years and ratio of
children screened < 5%
Preselected
Urban Areas
Figure 3.4: Procedure for selecting urban areas.
As a result, eight urban areas are selected from Indiana. The selected urban
areas include South Bend and Mishawaka, Northern Lake County, Elkhart and
Goshen, Fort Wayne, Indianapolis, Muncie, Evansville, and Clarksville, New Albany
and Jeffersonville. These urban areas are shown in Figure3.5.
20
Figure 3.5 Selected urban areas in Indiana
21
3.2.3. Model Building
In this research, stepwise and backward elimination methods are used to
choose the independent variables. Least squares regression methods are used to build
evaluation models for selected cities in Indiana.
3.2.3.1. Selection of Independent and Dependent Variables
The selection of dependent variables involves considering whether the
distribution of the variable is normal (Talbot et al., 1998, Miranda et al., 2002, Haley
and Talbot, 2004). In this research, the dependent variable was chosen from the
log-transformed percentage of children with BLLs >=10 ug/dL in each census tract
plus one. The independent variables were chosen from Ratio of Black, Ratio of
Hispanic, Ratio of Rental Housing, Ratio of Vacant Housing, Ratio of High School or
above Education, Ratio of Housing Built before 1950, Ratio of Housing Built before
1980, Ratio of One Parent Family, Ratio of Children under Five Years Living below
the Poverty Level, Ratio of Children under Eighteen Years Living below the Poverty
Level, Average Household Size, and Average Family Size. For each urban area, all of
the twelve independent variables were considered and the most suitable independent
variables were selected to build the model for the different urban areas. When
selecting the independent variables, the following criteria were used:
1.
The correlation between the independent variables and the dependent variable.
2.
The exploratory power of the independent variables (Miranda et al., 2002).
3.
The significant interactive effects among independent variables (Miranda et al.,
2002).
22
3.2.3.2. Least Squares Regression
Figure 3.6: Least squares regression Line.
The least-squares regression line is a line that minimizes the sum of squared
vertical distances between each data point and the (Figure 3.6). When constructing a
least squares regression, four criteria need to be considered (JChapman and Charles,
2000):
1. The variables are assumed to have a linear relationship.
2.
For every value of the independent variable, the distribution of residuals or
error values should be normal, and the mean of the residuals should be zero.
3. For every value of the independent variable, the variance of residual error is
assumed to be equal.
4. The value of each residual is independent of all other residual values.
23
3.2.3.3. Testing the Models
There are two methods to test the least squares regression models. The first
method is using statistical criteria, and the second method is drawing a residuals map
to display the error patterns. In this research, both of the methods are used.
The statistical criteria include the coefficient of multiple determination (R2),
which is used to interpret the percentage of variation explained, and homoscedasticity,
which is used to describe whether variance of the residuals is homogeneous across
levels of the predicted values. A value is calculated for the coefficient of multiple
determination (R2). In addition, when using the multiple determination (R2), an F test
(or F statistic) is used to evaluate the significance of R2. If the probability of the F
value is sufficiently small (has a small p value), one can conclude that the
independent variable accounts for a significant amount of the total variation in the
dependent variable (McGrew and Monroe, 2000). For homoscedasticity, a chart of
regression standardized residuals and regression standardized predicted value is
created for the test.
4. RESULTS
Results of the analysis, which include the maps, charts and models, are
presented in this chapter. More information for the model results is listed in Appendix
A. The information includes the test of normal distribution of dependent variables for
each model, R2 for the model, and the distribution of dependent variables to each
independent variable.
4.1
Childhood Lead Screening in Indiana through Five Years
The number of children screened, number of children with BLLs above 10
ug/dL, number of children with BLLs above 20 ug/dL, and the percentages from
1998 to 2002 are shown in Figures 4.1 and 4.2.
Figure 4.1: Number of Screened and EBLLs Children in Indiana from 1998 to 2002.
25
Figure 4.2: Percentage of Children Screened with EBLLs
in Indiana from 1998 to 2002 in Indiana
4.2 Children BLLs in Selected Urban Areas
Figures 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, and 4.10 illustrate the percentage of
children with elevated blood lead levels (BLLs>=10 ug/dL) for census tracts in the
selected urban areas of Indiana from 1998 to 2002. Figures 4.11, 4.12, 4.13, 4.14,
4.15, 4.16, 4.17 and 4.18 illustrate the number of screened children and number of
children with EBLLs in the selected urban areas from 1998 to 2002.
These figures display the change of percentages of the EBLLs among the
screened children from 1998 to 2002 for the eight selected urban areas. For most
urban areas, the percentage of EBLLs reduced through time. However, in some urban
areas the value increased. One of the reasons for the increase in EBLLs is because of
a reduction in the number of children screened. For example, the 2002 increase in
percentage of EBLLs in South Bend and Mishawaka is probably caused by the
reduction of children screened. For the reason that the value of dependent variable
would influence the procedure of selection of independent variables and parameters
in the models, it is necessary to test if the reduction of the number of children
screened would affect the results of generated models. Otherwise, these values need
to be excluded before building the model.
26
Figure 4.3 Percentages of Children with EBLLs in Muncie
from 1998 to 2002 by census tract
27
Figure 4.4 Percentages of Children with EBLLs in Evansville
from 1998 to 2002 by census tract
28
Figure 4.5 Percentages of Children with EBLLs in Indianapolis
from 1998 to 2002 by census tract
29
Figure 4.6 Percentages of Children with EBLLs in Elkhart and Goshen
from 1998 to 2002 by census tract
30
Figure 4.7 Percentages of Children with EBLLs in South Bend and
Mishawaka from 1998 to 2002 by census tract
31
Figure 4.8 Percentages of Children with EBLLs in Fort Wayne
from 1998 to 2002 by census tract
32
Figure 4.9 Percentages of Children with EBLLs in Northern Lake
County from 1998 to 2002 by census tract
33
Figure 4.10 Percentages of Children with EBLLs in Clarksville, New
Albany, and Jeffersonville from 1998 to 2002 by census tract
34
Figure 4.11: Number of Screened and EBLLs Children in Muncie from 1998 to 2002
Figure 4.12: Number of Screened and EBLLs Children in Evansville from 1998 to 2002
Figure 4.13: Number of Screened and EBLLs Children in Indianapolis from 1998 to 2002
35
Figure 4.14: Number of Screened and EBLLs Children in Elkhart,Goshen from 1998 to 2002
Figure 4.15: Number of Screened and EBLLs Children in South Bend and Mishawaka
from 1998 to 2002
Figure 4.16: Number of Screened and EBLLs Children in Fort Wayne from 1998 to 2002
36
Figure 4.17: Number of Screened and EBLLs Children in Northern Lake County from
1998 to 2002
Figure 4.18: Number of Screened and EBLLs Children in Clarksville, New Albany and
Jeffersonville from 1998 to 2002
4.3 Independent Variables
In the process of selecting of the independent variables, one needs to
consider the normal distribution of the dependent variable and the three criteria for
independent variable selection as indicated in Chapter Three. The dependent variable
is calculated using the following equation:
∑ Number of EBLLs in Each Year
Ln (Percentage of EBLLs +1) = Ln (———————————————————— * 100 + 1)
∑ Number of Screen Children in Each Year
37
SPSS was used to check the normal distribution of the residuals of the dependent
variable.
For independent variables, each of the twelve variables mentioned in Chapter
Three was calculated by averaging the value through five years. A correlation table
with all the independent and dependent variables was generated using SPSS. In
addition, two methods were used to select the independent variables. One is the
stepwise procedure. It is the procedure that at each step, the new independent variable
is chosen by the one that is not in the equation and has the smallest probability of F is
entered when the probability is sufficiently small. Variables already in the regression
equation are removed if their probability of F becomes sufficiently large. The method
terminates when no remaining variables are eligible for inclusion or removal. The
other method is backward elimination. It is the procedure in which all variables are
entered into the equation and then removed sequentially. The variable with the
smallest partial correlation with the dependent variable is considered first for removal.
If it meets the criterion for elimination, it is removed. After the first variable is
removed, the variable remaining in the equation with the smallest partial correlation is
considered next. The procedure stops when there are no variables in the equation that
satisfy the removal criteria (SPSS User’s Guide, 2006).
4.4 Test the Model with Different Screened Data
Among Figures 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, and 4.10, Figure 4.7 was the
most obvious map that possessed an increase in the percentage of children with
EBLLs from 1998 to 2002. A further inspection of the data found that the increase
was probably due to a reduction in the number of children screened. Therefore, South
Bend and Mishawaka urban area was chosen to test the model. Two models were
38
generated. One used all-years data using stepwise and backward elimination. The
other excluded the year 2002. The results of the models are indicated below:
Ln (Ratio of EBLLs Children + 1) = - 0.373 + 7.392 * (Ratio of Vacant Housing) + 0.661 *
(Average Family Size)
Ln (Ratio of EBLLs Children + 1) = - 0.432 + 8.189 * (Ratio of Vacant Housing) + 0.633 *
(Average Family Size)
According to the results, the elimination of 2002 would not require an
introduction of new variables except changing the parameters slightly. Therefore, in
the procedure of building the model, all five years would be included in the model
building procedure.
4.5 Model of Indiana
The selected census tract areas in Indiana are shown in Figure 4.19. Two
models are generated by stepwise and backward elimination methods using SPSS, the
results are the same:
Ln(Ratio of EBLLs Children + 1) = -1.223 – 0.762 * (Ratio of Education with High school
and Above) + 2.099 * (Average Family Size) - 0.997 * (Average Household Size) -1.171 *
(Ratio of Housing Built before 1980) + 2.019 * (Ratio of Housing Built Before 1950) + 0.573
* (Ratio of Children under Eighteen Years Living below the Poverty Level)
39
Figure 4.19 Selected Census Tracts in Indiana
The R2 value for this model is 0.558. Figure 4.20 shows the test of the
homoscedasticity of the model. Figure 4.21 shows the residuals of model. Other
results are listed in Appendix A.
Figure 4.20 Test Homoscedasticity of State Model
40
Figure 4.21 Residuals by Census Tract
Selected Areas in Indiana
In Figure 4.21, area where more children were observed than the model
predicts have positive residuals and are shown in blue color, while areas where fewer
children observed than the model predicts have negative residuals and are shown in
red color. Both of these areas are considered as census tracts with high residuals.
4.6 Models of Selected Urban Areas
4.6.1. Muncie
The selected census tract areas in Muncie are shown in Figure 4.22. Two
models are generated by stepwise and backward elimination methods using SPSS for
Muncie as below:
Model generated by stepwise:
Ln (Ratio of EBLLs Children + 1) = 3.444 - 3.2999* (Ratio of Housing Built before 1980)
41
Model generated by backward elimination:
Ln (Ratio of EBLLs Children + 1) = 0.848 + 6.297 * (Ratio of one Parent Family) + 7.304 *
(Ratio of Vacant Housing) -3.474 * (Ratio of Children under Eighteen Years Living below the
Poverty Level)
Figure4.22 Selected Census Tracts in Muncie
The second model is chosen for the reason that it has a higher R2 value,
which is 0.644. Figure 4.23 shows the test of the homoscedasticity of the model.
Figure 4.24 shows the residuals of model. Other results are listed in the Appendix A.
Figure 4.23 Test Homoscedasticity of Muncie Model
42
Figure 4.24 Residuals by Census Tract
Selected Areas in Muncie
Using the same color pattern as state model residuals, areas where more
children were observed than the model predicts have positive residuals and are shown
in blue color, while areas where fewer children observed than the model predicts have
negative residuals and are shown in red color. Both of these areas are considered as
census tracts with high residuals. The following Figures 4.27, 4.30, 4.33, 4.36, 4.39,
4.42, and 4.45 use the same color pattern.
Figure 4.24 shows that the census tracts with high value of residuals are
located in the southern and western areas of Muncie, and two of them are in the center
of Muncie. Most of these areas are rural, educational or commercial areas. After
calculating the average ratio of children screened, the average in the census tracts with
high value of residuals was 23.18, and the average in the rest of the census tracts was
25.31. Based on these observations, the Muncie model is overall more accurate in the
census tracts with a high ratio of children screened.
43
4.6.2. Evansville
The selected census tract areas in Evansville are shown in Figure 4.25. Two models
are generated by stepwise and backward elimination methods using SPSS for
Evansville as below:
Model generated by stepwise:
Ln (Ratio of EBLLs Children + 1) = 0.757+ 10.46 * (Ratio of Vacant Housing)
Model generated by backward elimination:
Ln (Ratio of EBLLs Children + 1) = -1.270 + 0.663 * (Average Household Size) +
9.408 * (Ratio of Vacant Housing) + 1.074 * (Ratio of Housing Built Before 1950)
Figure4.25 Selected Census Tracts in Evansville
The second model is chosen for the reason that it has a higher R2 value,
which is 0.797. Figure 4.26 shows the test of the homoscedasticity of the model.
44
Figure 4.27 shows the residuals of model. Other results are listed in the Appendix A.
Figure 4.26 Test Homoscedasticity of Evansville Model
Figure 4.27 Residuals by Census Tract
Selected Areas in Evansville
Figure 4.27 shows the distribution of residuals by census tract in selected areas
in Evansville. This figure used the same color pattern as Figure 4.24. It shows that
45
census tracts with high value of residuals are distributed in the northern and southern
areas of Evansville. When referencing the remote sensing data of the local area, it was
found that most of these places were rural, educational, or public areas. After
calculating the average ratio of children screened, the average in the census tracts with
high value of residuals was 19.97, and the average in the rest of census tracts was
18.47.
4.6.3. Indianapolis
The selected census tract areas in Indianapolis are shown in Figure 4.28. Two
models are generated by stepwise and backward elimination methods using SPSS
show as below:
Model generated by stepwise:
Ln (Ratio of EBLLs Children + 1) = - 1.08 - 1.168 * (Ratio of Housing Built before 1980) +
1.879 * (Ratio of Children under Eighteen Years Living below the Poverty Level) + 1.586 *
(Ratio of Housing Built before 1950) + 1.1 * (Average Family Size) – 2.778 * (Ratio of One
Parent Family)
Model generated by backward elimination:
Ln (Ratio of EBLLs Children + 1) = 0.867 - 0.667 * (Average Household Size) + 1.286 *
(Average Family Size) - 1.285 * (Ratio of High School or above Education) + 1.319 * (Ratio
of Housing Built before 1950) - 1.215 * (Ratio of Housing Built before 1980) + 1.626 *
(Ratio of Children under Eighteen Years Living below the Poverty Level) – 1.953 * (Ratio of
Children under Five Years Living below the Poverty Level)
46
Figure 4.28 Selected Census Tracts in Indianapolis
The second model is chosen for the reason that it has a higher value R2 value,
which is 0.698. Because the distribution of the residuals of dependent variable of this
model is not normal, this model is not statistically stable. Figure 4.29 shows the test of
the homoscedasticity of the model. Figure 4.30 shows the residuals of model. Other
results are listed in the Appendix A.
Figure 4.29 Test Homoscedasticity of Indianapolis Model
47
Figure 4.30 Residuals by Census Tract
Selected Areas in Indianapolis
Figure 4.30 shows the distribution of residuals by census tract in selected
areas in Indianapolis. It shows that the census tracts with high value of residuals are
distributed in the northern, southern and western areas of Indianapolis. When
referencing the remote sensing image of the local areas, it was found that most of
these areas were suburban areas and some of the tracts were located in central of
Indianapolis, where many commercial buildings were located. After calculating the
average ratio of children screened, the average in the census tracts with high value of
residuals was 11.87, and the average in the rest of the census tracts was 12.19.
4.6.4. Elkhart and Goshen
The selected census tract areas in Elkhart and Goshen are shown in Figure
4.31. The model generated by stepwise methods using SPSS for Elkhart and Goshen
is indicated below. No validated model is generated by backward elimination method
using SPSS.
48
Ln (Ratio of EBLLs Children + 1) = 1.447 + 4.844 * (Ratio of Hispanic)
Figure 4.31 Selected Census Tracts in Elkhart and Goshen
The R2 value for this model is 0.651. Figure 4.32 shows the test of the
homoscedasticity of the model. Figure 4.33 shows the residuals of the model. Other
results are listed in the Appendix A.
Figure 4.32 Test Homoscedasticity of Elkhart and Goshen Model
49
Figure 4.33 Residuals by Census Tract
Selected Areas in Elkhart and Goshen
Figure 4.33 shows the distribution of residuals by census tract in
selected areas in Elkhart and Goshen. It shows that the distribution pattern for the
census tracts with high value of residuals in this area is not clear. The residuals in
these areas are low, all of which are lower than 0.50. After calculating the average
ratio of children screened, the average in the census tracts with a high value of
residuals was 6.56, and the average in the remaining census tracts was 13.24. It
displayed that the residuals are low for the census tracts with a high ratio of children
screened in Elkhart and Goshen.
50
4.6.5. South Bend and Mishawaka
The selected census tract areas in South Bend and Mishawaka are shown in
Figure 4.34. Two models are generated by stepwise and backward elimination
methods using SPSS for South Bend and Mishawaka. The results are the same:
Ln (Ratio of EBLLs Children + 1) = - 0.373 + 7.392 * (Ratio of Vacant Housing) + 0.661 *
(Average Family Size)
Figure 4.34 Selected Census Tracts in South Bend and Mishawaka
The R2 value of the model is 0.689. Figure 4.35 shows the test of the
homoscedasticity of the model. Figure 4.36 shows the residuals of model. Other
results are listed in the Appendix A.
51
Figure 4.35 Test Homoscedasticity of South Bend and Mishawaka Model
Figure 4.36 Residuals by Census Tract
Selected Areas in South Bend and Mishawaka
52
Figure 4.36 shows the distribution of residuals by census tract in selected
areas in South Bend and Mishawaka. The census tracts with a high value of residuals
are in the southern areas, which are suburban according to the remote sensing image
of the local area. After calculating the ratio of screened children, the average ratio in
the census tracts with a high value of residuals was 11.15, and the average ratio in the
rest of the census tracts was 12.74.
4.6.6. Fort Wayne
The selected census tract areas in Fort Wayne are shown in Figure 4.37. Two models
are generated by stepwise and backward elimination methods using SPSS in Fort
Wayne are shown as below:
Model generated by stepwise:
Ln (Ratio of EBLLs Children + 1) = 5.269 - 3.981 * (Ratio of High School or above
Education) + 3.259 * (Ratio of Housing Built before 1950)
Model generated by backward elimination:
Ln (Ratio of EBLLs Children + 1) = - 5.224 + 2.244 * (Average Family Size) +1.887 * (Ratio
of Rental Housing) - 7.447 * (Ratio of One Parent Family) + 5.251 * (Ratio of Housing Built
before 1950) - 5.694 * (Ratio of Children under Five Years Living below the Poverty Level) +
3.594 * (Ratio of Children under Eighteen Years Living below the Poverty Level)
53
Figure 4.37 Selected Census Tracts in Fort Wayne
The second model is chosen for the reason that it has a higher R2 value,
which is 0.825. Figure 4.38 shows the test of the homoscedasticity of the model.
Figure 4.39 shows the residuals of model. Other results are listed in the Appendix A.
Figure 4.38 Test Homoscedasticity of Fort Wayne Model
54
Figure 4.39 Residuals by Census Tract
Selected Areas in Fort Wayne
Figure 4.39 shows the distribution of residuals by census tract in selected
areas of Fort Wayne. This figure used the same pattern as Figure 4.24. It shows that
the census tracts with high value of residuals are in the northern, eastern and southern
areas and there is no spatial pattern for this distribution. After calculating the ratio of
children screened, the average in the census tracts with high value of residuals was
7.40, and the average in the rest of census tracts was 5.55. We could conclude that
increasing the ratio of screened children might not help to reduce the residual in the
Fort Wayne Model.
4.6.7. Northern Lake County
The selected census tract areas in Northern Lake County are shown in Figure
4.40. Two models are generated by stepwise and backward elimination methods using
SPSS in Northern Lake County as shown below:
55
Model generated by stepwise:
Ln (Ratio of EBLLs Children + 1) = 1.255 + 7.484 * (Ratio of Vacant Housing)
Model generated by backward elimination:
Ln (Ratio of EBLLs Children + 1) = 0.218 + 1.361 * (Average Family Size) – 1.242 * (Ratio
of Hispanic) + 0.696 * (Ratio of Black) – 3.453 * (Ratio of Housing Built before 1980) –
1.160 * (Ratio of Children under Eighteen Years Living below the Poverty Level)
Figure 4.40 Selected Census Tracts in Northern Lake County
The second model is chosen for the reason that it has a higher R2 value,
which is 0.716. Figure 4.41 shows the test of the homoscedasticity of the model.
Figure 4.42 shows the residuals of model. Other results are listed in the Appendix A.
56
Figure 4.41 Test Homoscedasticity of Northern Lake County Model
Figure 4.42 Residuals by Census Tract
Selected Areas in Northern Lake County
Figure 4.42 shows the distribution of residuals by census tract in selected
areas in Northern Lake County. It shows that the census tracts with a high value of
residuals spread around the outside of the selected areas, where many factories and
57
schools are located. After calculating the ratio of children screened, the average in the
census tracts with a high value of residuals is 4.79, and the average in the rest of
census tracts is 4.85. Because the difference is too small, whether increasing the ratio
of children screened would help to decrease the residual of Northern Lake County
Model is not clear.
4.6.8. Clarksville, New Albany and Jeffersonville
The selected census tract areas in Clarksville, New Albany and Jeffersonville
are shown in Figure 4.43. Two models are generated by stepwise and backward
elimination methods using SPSS for Clarksville, New Albany and Jeffersonville,
showing as below:
Model generated by stepwise:
Ln (Ratio of EBLLs Children + 1) = 3.769 - 4.215 * (Ratio of High School or above
Education) + 12. 216 * (Ratio of Vacant Housing)
Model generated by backward elimination:
Ln (Ratio of EBLLs Children + 1) = 1.101 +1.133 * (Average Household Size) + 15.975 *
(Ratio of Vacant Housing) - 4.620 * (Ratio of High School or above Education) + 3.051 *
(Ratio of Housing Built before 1950) + 8.966 * (Ratio of Children under Five Years Living
below the Poverty Level) - 4.734 * (Ratio of Children under Eighteen Years Living below the
Poverty Level)
58
Figure 4.43 Selected Census Tracts in Clarksville, New Albany and Jeffersonville
The second model is chosen for the reason that it has a higher R2 value, which
is 0.715. Figure 4.44 shows the test of the homoscedasticity of the model. Figure 4.45
shows the residuals of the model. Other results are listed in the Appendix A.
Figure 4.44 Test Homoscedasticity of Clarksville, New Albany and Jeffersonville
Model
59
Figure 4.45 Residuals by Census Tract
Selected Areas in Clarksville, New Albany and Jeffersonville
Figure 4.45 shows the distribution of residuals by census tract in selected
areas in Clarksville, New Albany and Jeffersonville. It shows that the census tracts
with a high value of residuals are mainly distributed outside of the areas of Clarksville,
New Albany and Jeffersonville and two of them are located in the central area. Many
of these areas are rural areas that cover large areas of forest or commercial areas.
After calculating the ratio of children screened, the average in the census tracts with a
high value of residuals is 11.15, and the average in the rest of the census tracts is
12.74. From these results, it could be concluded that increasing the ratio of children
screened would help to decrease the average of residuals for Clarksville, New Albany
and Jeffersonville model.
4.6.9. Comparison of Models Based on City Size
Based on the population of urban areas, the selected urban areas in Indiana
60
are categorized into two classes. One is large urban areas with populations greater
than 200,000. The other category is small urban areas with populations less than
200,000. The large urban areas include Northern Lake County, Fort Wayne and
Indianapolis and the smaller urban areas include Evansville, Elkhart-Goshen, Muncie,
Clarksville, New Albany and Jeffersonville, and South Bend and Mishawaka.
The models of large urban areas are summarized in Table 4.1. In large urban
areas, the children’s EBLLs are directly associated with average family size and the
ratio of housing built before 1950, but indirectly associated with the ratio of housing
built before 1980 and the ratio of children under five years living below the poverty
level in two of the generated models of large urban areas. This means that these four
variables have a high possibility to serve as a common variable in large urban areas.
Whether the ratio of rental housing, the ratio of black, the ratio of Hispanic, the ratio
of one parent family, and the ratio of high school or above education are related with
children’s EBLLs is unclear because they only appear in one model. Average
household size is directly associated with children’s EBLLs in Fort Wayne, but
indirectly associated with children’s EBLLs in Indianapolis, and the ratio of children
under eighteen years living below the poverty level is directly associated with
children’s EBLLs in Fort Wayne and Indianapolis, but indirectly associated with
children’s EBLLs in Northern Lake County.
The models of small urban areas are summarized in Table 4.2. In small urban
areas, the children’s EBLLs are directly associated with the ratio of vacant housing in
four models. The children’s EBLLs are directly associated with average household
size and the ratio of housing built before 1950 in the two models. The relationship
between children’s EBLLs and average family size, the ratio of Hispanic, the ratio of
children under five years living below the poverty level, the ratio of children under
61
eighteen years living below the poverty level, the ratio of housing built before 1980,
the ratio of one parent family, and the ratio of high school or above education is
unclear because they only appear in one model.
4.6.10. Comparison of Models Based on Location
Based on the location of urban areas, the selected urban areas in Indiana are
categorized into three regional categories, including South Bend and Mishawaka,
Northern Lake County, Elkhart and Goshen, and Fort Wayne in Northern Indiana,
Indianapolis and Muncie in Central Indiana, and Evansville and Clarksville, New
Albany, and Jeffersonville in Southern Indiana.
The models of the urban areas in Northern Indiana are summarized in Table
4.3. In Northern Indiana, the children’s EBLLs are directly associated with average
family size as two models contain this variable. There are some contradictions within
models for the variables of the ratio of Hispanic and the ratio of children under
eighteen years living below the poverty level. For the variables of the ratio of vacant
housing, the ratio of rental housing, average household size, the ratio of black, the
ratio of children under five years living below the poverty level, the ratio of housing
built before 1950, the ratio of housing built before 1980, and the ratio of one parent
family, only one model contains these variables.
The models for the urban areas in Central Indiana are summarized in Table
4.4. In Central Indiana, the children’s EBLLs are indirectly associated with the ratio
of housing built before 1980. For the variables of the ratio of vacant housing, average
household size, average family size, the ratio of children under eighteen years living
below the poverty level, the ratio of housing built before 1950, the ratio of one parent
family, the ratio of high school or above education, and the ratio of children under
62
eighteen years living below the poverty level, only one model contains these
variables.
The models of the urban areas in Southern Indiana are summarized in Table
4.5. In Southern Indiana, the children’s EBLLs are directly associated with the ratio of
vacant housing, average household size, and the ratio of housing built before 1950 in
both of the models. For the variables of the ratio of children under five years living
below the poverty level, the ratio of children under eighteen years living below the
poverty level, and the ratio of high school or above education, only one model
contains these variables.
4.6.11. Comparison of Models Based on Accuracy
In this research, a state-scale model was applied to the eight urban areas and
the residuals were calculated. The error of this model was computed by using the
absolute residual divided by the actual value. The errors of the urban model were
calculated in the same procedure. All the results are shown in Table 4.6.
Table 4.6 shows the difference between the urban model and state-scale
model applied to the same area. Obviously, the urban models have a higher accuracy
than the state-scale model in most of the areas. There are some exceptions in that the
state-scale model is suitable for calculating Indianapolis and Elkhart and Goshen for
the reason that the difference between the state-scale model and urban model is not
large. However, using the state-scale model would cause large errors when using it in
other areas such as Northern Lake County, Fort Wayne or Clarksville, or New Albany
and Jeffersonville.
Table 4.1 Comparison of Model Parameters for Large Urban Areas
Ratio of High
Ratio of Children under Ratio of Children under
Ratio of Ratio of One
Housing Housing
School or
Ratio of Rental Ave HH Ave Family Ratio of Five Years Living below Eighteen Years Living
Name
Hispanic Parent Family
before 1950 before 1980
above
the Poverty Level below the Poverty Level
Housing
Size
Size
Black
1.361
0.696
-1.16
-3.453
-1.242
Northern Lake County
1.887
2.244
-5.694
3.594
5.251
-7.447
Fort Wayne
-0.667
1.286
-1.953
1.626
1.319
-1.215
-1.285
Indianapolis
Table 4.2 Comparison of Model Parameters for Small Urban Areas
Name
Ratio of Children under Ratio of Children under
Ratio of Vacant Ave HH Ave Family Ratio of
Five Years Living below Eighteen Years Living
Housing
Size
Size
Hispanic
the Poverty Level below the Poverty Level
Housing before 1950
Housing Ratio of One Ratio of High
before 1980 Parent Family School or above
4.844
7.304
Clarksville‐New Albany‐
Jeffersonville
15.975
South Bend‐Mishawaka
Evansville
7.392
9.408
-3.474
1.133
8.966
-4.734
3.051
6.297
-4.62
0.661
0.663
1.074
63
Elkhart‐Goshen
Muncie
Table 4.3 Comparison of Model Parameters for Northern Indiana
Name
South Bend‐Mishawaka
Northern Lake County
Elkhart‐Goshen
Fort Wayne
Ratio of Vacant Ratio of Rental Ave HH Ave Family
Housing
Housing
Size
Size
7.392
0.661
1.361
1.887
Ratio of
Hispanic
Ratio of
Black
-1.242
4.844
0.696
2.244
Ratio of Children
Ratio of Children under
Housing
Housing Ratio of One
under Five Years
Eighteen Years Living
before 1950 before 1980 Parent Family
Living below the
below the Poverty Level
Poverty Level
-1.16
-5.694
3.594
-3.453
5.251
-7.447
Table 4.4 Comparison of Model Parameters for Central Indiana
Name
Indianapolis
Muncie
Ratio of Vacant
Housing
Ave HH
Size
Ave Family
Size
-0.667
1.286
Ratio of Children under
Housing before Housing before
Eighteen Years Living
1950
1980
below the Poverty Level
1.626
1.319
7.304
-1.215
-3.474
Ratio of One
Parent Family
Ratio of Children under
Ratio of High School or
Five Years Living below
above
the Poverty Level
-1.285
-1.953
6.297
Table 4.5 Comparison of Model Parameters for Southern Indiana
Name
Clarksville‐New Albany‐
Jeffersonville
Evansville
Ratio of Vacant
Housing
Ave HH
Size
15.975
1.133
9.408
0.663
Ratio of Children under Ratio of Children under
Housing before Ratio of High
Five Years Living below Eighteen Years Living
1950
School or above
the Poverty Level
below the Poverty Level
8.966
-4.734
3.051
-4.62
1.074
64
64
65
Table 4.6 Comparison of Models of different urban areas with the Model of Indiana
Name
Muncie
Clarksville, New Albany and Jeffersonville
South Bend and Mishawaka
Indianapolis
Fort Wayne
Evansville
Elkhart and Goshen
Northern Lake County
Urban Model Average Residual
0.3522
Urban Model Error(%)
27.704
State Model Average Residual
0.417
State Model Error(%)
33.073
0.315
27.729
0.528
52.371
0.241
0.309
0.28
0.255
0.264
0.267
11.802
21.568
15.904
23.198
14.191
15.113
0.38
0.316
0.521
0.321
0.334
0.425
17.637
23.139
23.808
36.214
15.475
22.052
5. SUMMARY AND DISCUSSION
In this research, children’s EBLLs are analyzed based on lead screening data
from Indiana. According to Figure 4.1 and Figure 4.2, the number and percentage of
children’s EBLLs above 10ug/dL and above 20ug/dL decreased from 1998 to 2002.
Figures 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, and 4.10 show the detail of the changes of the
children’s EBLLs in selected urban areas.
In order to test each model, an R squared value, a test of Homoscedasticity,
and a residuals map are applied to each model. The R Squared values for all of the
models range from 0.5 to 0.8 and most of them are distributed within 0.6 and 0.8. The
charts of regression standardized residuals and regression standardized predicted point
to the validity of the generated models. Through the residuals map in the eight
selected urban areas, it is found that the distribution of the census tracts with high
value of residuals are located in the outer periphery of most urban areas and include
the rural, suburban, educational or commercial areas, where there were fewer
residents are located. An exception for this distribution was in Elkhart and Goshen. It
was also found that for some models, the residual was lower in the census tracts with
a high ratio of children screened.
Based on the entire state and different urban areas in Indiana, the children’s
EBLLs varied. In each of the urban areas, not all of the twelve parameters were used
67
in the final models. Through the analysis of these models, the common parameters
were found according to different sizes of urban areas and locations.
Children’s EBLLs are correlated to socio-economic variables based on urban
area size. In large urban areas, children’s EBLLs are related with average family size,
the ratio of children under five years living below the poverty level, the ratio of
housing built before 1950, and the ratio of housing built before 1980. In smaller urban
areas, children’s EBLLs are related with the ratio of vacant housing, average
household size, and the ratio of housing built before 1950. In both large and smaller
urban areas, the models are related with the ratio of old housing stock. These results
conform with other research (Haley and Talbot, 2004; Kim et al. 2008).
Based on different locations, there is no common parameter for the models of
northern, central and southern Indiana. In northern Indiana, the children’s EBLLs are
associated with average family size, and in central Indiana, the children’s EBLLs are
associated with the ratio of housing built before 1980. In southern Indiana, the
children’s EBLLs are associated with the ratio of vacant housing, average household
size, and the ratio of housing built before 1950. It was found that more common
parameters existed in the southern Indiana model than in the northern and central
Indiana models.
On the other hand, some limitations are inherent in this research. For the
reason that children’s EBLLs information comes from the Indiana State Department
of Health, it cannot control the procedure of collecting samples of children screened
and the accuracy of the sample data. In order to improve the quality of research data,
two filters were used to exclude unsatisfactory data. It could increase the accuracy of
the data, but causes a reduction of records when using the filters and the possibility of
creating error still exists, which needs to be considered in future research. In addition,
68
by listing the figures and charts of children screened in each urban area from 1998 to
2002 in Chapter Four, it was found that the changes of the children’s EBLLs were not
consistent. One of the reasons is that the sample was not consistent through the
five-year-period. A test was made in Chapter Four to measure whether the
inconsistent samples would change the models significantly for the South Bend and
Mishawaka model. The results showed that this would only change the model slightly
without introducing new parameters. However, it is not clear that the inconsistent
samples could be ignored in other states or in different spatial resolutions such as
census blocks or zip code areas.
Another research limitation is that the socio-economic and housing
information in the census tract areas was calculated using 2000 census data. It is
assumed that this information would not change dramatically through the
five-year-period. However, this information could have changed through time, and
there might be some related errors related with that. In addition, the distribution of the
residuals of the dependent variable was not norma in the Indianapolis model, which
might affect the stablility of the generated model.
Aside from the limitations of this research, some suggestions for future study
of children EBLLs include the following. First, in order to compare different models
in different urban areas, this research standardized a method to select independent
parameters by stepwise and backward elimination using SPSS. These methods could
be used to compare the city models to cities in other states.Second, both state and
urban area models were generated in this research. None of the models had exactly
the same parameters. This result shows the necessity to generate models for different
locations. Third, it was found that some of the same parameters existed in the models
of the same urban size or location in Indiana. The results show that geographic factors
69
could be potential elements in building a model for children’s EBLLs. How to
incorporate geographic parameters into the model requires additional research.
Fourth, applying the state model to each of the urban areas can help to test whether
we could apply a state level model to each location. From Table 4.6, it can be
concluded that the state level model has poorer accuracy when compared to the urban
area models. However, to build a model for each urban area would require substantial
much time and perhaps may not be achievable because of the lack of data. How to
balance the weakness of both the urban area models and the state model would be a
fruitful direction for further study.
This research primarily explored different models of children’s BLLs in
Indiana based on the socio-economic and housing parameters that are inherently
geographic. Some researchers also suggest linking children’s BLLs with lead dust
changes, weather, soil moisture, as listed in Chapter Two. According to the data
available, this research could not achieve that goal. In addition, using small region
resolutions such as individual data to build the models would be more accurate and
could help to better understand the parameters in the models.
6. REFERENCES
Canfield RL, Henderson CR, Cory-Slechta Da, Cox C, Jusko Ta, Lanpher BP. (2003).
Intellectual Impairment in children with blood lead concentration below 10 ug per
deciliter. N Egn J Med 348:1517-1526
CDC. (1997). Screening Young Children for Lead Poisoning: Guidance for State and
Local Public Health Officials. Atlanta. GA: Center for Disease Control and
prevention.
Mushak P. (1992). Defining lead as the premiere environmental health issue for
children in America: criteria and their quantitative application. Environment Research
59:281-309
Tonny J. Oyana and Florence M. Margai.(2007).Geographic Analysis of Health Risks
of Pediatric Lead Exposure: A Golden Opportunity to Promote Health
Neighborhoodes. Archieves of Environment & Occupational Health 62(2):93-104
Kristen Rappazzo, Curtis E. Cummings, Tobert M. Himmelsbach, and Richard Tobin.
(2007). The Effect of Housing Compilance Status on Children’s Blood Lead Levels.
Archieves of Environment & Occupational Health 62(2):81-85
Landrigan PJ, Schechter CB, Lipton JM, Fahs MC, Schwatz J. (2002). Environmental
pollutants and disease in American children: estimates of morbidity, mortality and
costs for lead poisoning, asthma, cancer, and developmental disabilities. Environment
Health Perspect 110:721-728
Dohyeong Kim, M. Alicia Overstreet Galeano, Andrew Hull, and Marie Lynn
Miranda. (2008). A Framework for Widespread Replication of a Highly Spatially
Resolved Childhood Lead Exposure Risk Model. Environmental Health Perspectives
116(12):1735-1739
Ellen K. Cromley and Sara L. McLafferty.(2002). GIS and Public Health(The
Guilford Press). New York: A Division of Guilford Publication, Inc.
Indiana State Deparment of Health. (2004). Indiana’s Childhood Lead Poising
Elimination Plan.
Lisel A O’Dwyer.(1998). The Use of GIS in Identify Risk of Elevated Blood Lead
Levels in Australia. GIS in Public Health 3rd National Conference. CA: San Diego
71
Valeri B Heley and Tomas O. Talbot. (2004). Geographic Analysis of Blood Lead
Level in New York State Children Born 1994-1997. Environmental Health
Perspective 112:1577-1582
Tomas O Talbot, Steven P Forand, Valerie B Haley. (1998).Geographic Analysis of
Childhood Exposure in New York State. GIS in Public Health 3rd National
Conference. CA: San Diego
Kay Bruening, Francies W. Kenp, Nicole Simone, Yvette Holding, Donald Bl. Louria,
and John D. Bogden.(1999). Dietary Calcium Intakes of Urban Children at Risk of
Lead Poisoning. Environmental Health Perspectives 107(6):431-435
Mark R. Farfel, Anna O. Orlova, Peter S. J. Lees, Charles Rohde, Peter J. Ashley, and
Julian Chisolm, Jr.(2003). A Study of Urban Housing Demolitions as Sources of Lead
of Lead in Ambient Dust: Demolition Practices and Exterior Dust Fall. Environment
Health Perspectives 111(9):1228-1234
Chassan A. Khoury and Gary L. Diamond.(2003). Risks to children from exposure to
lead in air during remedial or removal activities at Superfund Site: A case study of the
RSR lead smelter Superfund site. Journal of Exposure Analysis and Environmental
Epidemiology 13(1):51-65
Mark A.S. Laidlaw, Howard W. Mielke, Gabriel M. Filippelli, David L. Johnson, and
Chirstopher R. Gonzales. (2005). Seasonality and Children’s Blood Lead Level:
Developeing a Predictive Model Using Climatic Variable and Blood Lead Data from
Indianapolis, Indiana, Syracuse, New York, and New Orleans, Louisiana
(USA).Environmental Health Perspectives.113(6):793-800
Kristen Rappazzo, Curtis E. Cummings, Robert m. Himmelsbach, and Richard Tobin.
(2007). The Effect of Housing Compliance Status on Children’s Blood Lead Level.
Archives of Environmental & Occupational Health. 62(2): 81-85
Marie Lynn Miranda, Dana C. Dolinoy, and Alicia Overstreet. (2002). Mapping for
Prevention: GIS Models for directing Childhood Lead Poisoning Prevention Programs.
Environmental health Perspective 110(9): 947-953
James R Roberts, Thomas C. Hulsey, Gerald B. Curtis, and J. Routt Reigart. (2003).
Using Geographic Information Systems to Assess Risk for Elevated Blood Lead
Levels in Children. Public Health Reports 118:221-228
Florence Lansana Margai. (1998). Geographic Information Analysis of Pediatric Lead
Poisioning. GIS in Public Health 3rd National Conference. CA: San Diego
J. Chapman McGrew, Jr and Charles B. Bonroe (2000). An Introduction to Statistical
Problem Solving in Geography. Boston, Madison, New York: McGraw Hill Press
Indiana Geological Survey. 2008. http: //129.79.145.7/arcims/statewide_mxd /index.
html
APPENDIX A
7.1 State Level Model:
Table 7.1: Test the Normal Distribution of residuals of dependent variable in State level
Model
Tests of Normality
a
Kolmogorov-Smirnov
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Unstandardized Residua
.032
465
.200*
.995
465
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.1: Histogram of residual of dependent variable in Indiana State
Sig.
.125
Table 7.2: Result of using the stepwise for selected tracts in Indiana
Model Summaryg
Change Statistics
Model
1
2
3
4
5
6
R
R Square
.515a
.265
b
.656
.430
.692c
.479
d
.733
.538
.743e
.552
.747f
.558
Adjusted
R Square
.263
.427
.476
.534
.547
.552
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
R Square
Change
.265
.165
.049
.059
.015
.005
F Change
166.943
133.496
43.498
58.310
15.151
5.504
df1
1
1
1
1
1
1
df2
463
462
461
460
459
458
Sig. F Change
.000
.000
.000
.000
.000
.019
DurbinWatson
1.637
a. Predictors: (Constant), AVE_FAM_SZ
b. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ
c. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980
d. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980, Ratio_1950
e. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980, Ratio_1950, Poverty_Ra
f. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980, Ratio_1950, Poverty_Ra, Ratio_H_Ed
g. Dependent Variable: Ln(Ratio_Pb_10+1)
73
Table 7.3: Result of using the backward elimination method for selected tracts in Indiana
Model Summaryh
Change Statistics
Model
1
2
3
4
5
6
7
R
R Square
.749a
.561
b
.749
.561
.749c
.561
d
.749
.561
.749e
.560
.748f
.559
g
.747
.558
Adjusted
R Square
.549
.550
.551
.552
.553
.553
.552
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
**********
R Square
Change
.561
.000
.000
.000
.000
-.001
-.002
F Change
48.091
.016
.024
.152
.261
.933
1.848
df1
12
1
1
1
1
1
1
df2
452
452
453
454
455
456
457
Sig. F Change
.000
.898
.876
.697
.610
.335
.175
DurbinWatson
1.637
a. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Hisp, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent,
Poverty_Wi, Ratio_One_, AVE_FAM_SZ
b. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Hisp, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent,
Ratio_One_, AVE_FAM_SZ
c. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Hisp, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent,
AVE_FAM_SZ
d. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ
e. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ
f. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ
g. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_H_Ed, AVE_FAM_SZ
h. Dependent Variable: Ln(Ratio_Pb_10+1)
74
Table 7.4: Coefficient of Indiana Model
Coefficientsa
Model
1
(Constant)
AVE_FAM_SZ
AVE_HH_SZ
Ratio_1980
Ratio_1950
Poverty_Ra
Ratio_H_Ed
Unstandardized
Coefficients
B
Std. Error
-1.223
.537
2.099
.216
-.997
.149
-1.171
.145
2.019
.321
.573
.227
-.762
.325
Standardized
Coefficients
Beta
.639
-.386
-.275
.215
.110
-.099
t
-2.277
9.719
-6.672
-8.100
6.292
2.530
-2.346
Sig.
.023
.000
.000
.000
.000
.012
.019
Zero-order
.515
.154
-.387
.362
.490
-.509
Correlations
Partial
.414
-.298
-.354
.282
.117
-.109
Part
.302
-.207
-.252
.196
.079
-.073
Collinearity Statistics
Tolerance
VIF
.224
.288
.836
.828
.511
.546
4.471
3.467
1.196
1.208
1.956
1.833
a. Dependent Variable: Ln(Ratio_Pb_10+1)
75
76
Figure 7.2: Distribution of dependent variable to each independent variable in Indiana.
77
7.2. Muncie Model:
Table 7.5: Test the Normal Distribution of residual of dependent variable in Muncie Model
Tests of Normality
a
Unstandardized Residual
Kolmogorov-Smirnov
Statistic
df
Sig.
.106
21
.200*
Statistic
.967
Shapiro-Wilk
df
21
Sig.
.670
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.3: Histogram of dependent variable in Muncie
Table 7.6: Result of using the stepwise method for selected tracts in Muncie
b
Model Summary
Model
1
R
R Square
.783a
.613
Change Statistics
Adjusted Std. Error of R Square
df1
df2
Sig. F Change
R Square the Estimate Change F Change
.593
**********
.613
30.105
1
19
.000
DurbinWatson
1.672
a. Predictors: (Constant), Ratio_1980
b. Dependent Variable: Ln(Ratio_Pb_10+1)
Table 7.7: Coefficients of Muncie Model using stepwise method
a
Coefficients
Model
1
(Constant)
Ratio_1980
Unstandardized Standardized
Coefficients
Coefficients
B
Std. Error
Beta
3.444
.375
-3.299
.601
-.783
a. Dependent Variable: Ln(Ratio_Pb_10+1)
t
9.179
-5.487
Correlations
Sig.
Zero-order Partial
.000
.000
-.783
-.783
Part
-.783
Collinearity Statistics
Tolerance
VIF
1.000
1.000
Table 7.8: Result of using the backward elimination method for selected tracts in Muncie
Model Summary
k
Change Statistics
Model
1
2
3
4
5
6
7
8
9
10
R
.876a
.876b
.876c
.876d
.871e
.868f
.861g
.848h
.820i
.803j
R Square
.768
.768
.768
.767
.759
.754
.741
.719
.673
.644
Adjusted
R Square
.419
.484
.535
.576
.599
.622
.630
.626
.591
.582
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
**********
**********
**********
**********
R Square
Change
.768
.000
.000
-.001
-.007
-.005
-.013
-.022
-.047
-.028
F Change
2.202
.001
.003
.040
.350
.255
.698
1.169
2.492
1.370
df1
df2
12
1
1
1
1
1
1
1
1
1
8
8
9
10
11
12
13
14
15
16
Sig. F Change
.134
.979
.961
.846
.566
.622
.419
.298
.135
.259
DurbinWatson
1.487
a. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, AVE_HH_SZ, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_H_Ed,
Poverty_Wi, Ratio_One_, Ratio_Vaca
b. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_H_Ed, Poverty_Wi,
Ratio_One_, Ratio_Vaca
c. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Poverty_Wi, Ratio_One_,
Ratio_Vaca
d. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_One_, Ratio_Vaca
e. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_One_, Ratio_Vaca
f. Predictors: (Constant), Poverty_Ra, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_One_, Ratio_Vaca
g. Predictors: (Constant), Poverty_Ra, Ratio_Blac, Ratio_Rent, Ratio_Hisp, Ratio_One_, Ratio_Vaca
h. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_Hisp, Ratio_One_, Ratio_Vaca
i. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_One_, Ratio_Vaca
j. Predictors: (Constant), Poverty_Ra, Ratio_One_, Ratio_Vaca
k. Dependent Variable: Ln(Ratio_Pb_10+1)
78
Table 7.9: Coefficients of Muncie Model using the backward elimination method
a
Coefficients
Unstandardized Standardized
Coefficients
Coefficients
Model
B
Std. Error
Beta
1
(Constant)
.848
.262
Ratio_Vaca 7.304
1.561
.791
Poverty_Ra -3.474
.953
-.702
Ratio_One_ 6.297
2.397
.462
t
3.236
4.679
-3.647
2.627
Sig.
.005
.000
.002
.018
Correlations
Zero-order Partial
.588
-.034
.336
.750
-.663
.537
Part
.677
-.527
.380
Collinearity Statistics
Tolerance
VIF
.731
.564
.677
1.367
1.773
1.477
a. Dependent Variable: Ln(Ratio_Pb_10+1)
79
80
Figure 7.4: Distribution of dependent variable according to each independent variable in
Muncie
81
7.3 Evansville Model:
Table 7.10: Test the Normal Distribution of residual of dependent variable in Evansville
Model
Tests of Normality
a
Kolmogorov-Smirnov
Statistic
df
Sig.
Unstandardized Residua
.110
36
.200*
Statistic
.977
Shapiro-Wilk
df
36
Sig.
.641
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.5: Histogram of residual of dependent variable in Evansville.
Table 7.11: Result of using the stepwise for selected tracts in Evansville
Model Summaryb
Change Statistics
Model
1
R
R Square
.870a
.757
Adjusted
R Square
.750
Std. Error of
the Estimate
.375
R Square
Change
.757
F Change
105.966
df1
df2
1
34
Sig. F Change
.000
DurbinWatson
2.474
a. Predictors: (Constant), Ratio_Vaca
b. Dependent Variable: Ln(Ratio_pb_10+1)
Table 7.12: Coefficients of Evansville Model using stepwise method
Coefficientsa
Model
1
(Constant)
Ratio_Vaca
Unstandardized
Coefficients
B
Std. Error
.757
.114
10.460
1.016
Standardized
Coefficients
Beta
a. Dependent Variable: Ln(Ratio_pb_10+1)
.870
t
6.658
10.294
Sig.
Zero-order
.000
.000
.870
Correlations
Partial
.870
Part
.870
Collinearity Statistics
Tolerance
VIF
1.000
1.000
Table 7.13: Result of using the backward elimination method for selected tracts in Evansville
Model Summaryk
Change Statistics
Model
1
2
3
4
5
6
7
8
9
10
R
.907a
.907b
.907c
.907d
.907e
.907f
.905g
.902h
.897i
.893j
R Square
.823
.823
.823
.822
.822
.822
.818
.813
.804
.797
Adjusted
R Square
.731
.742
.752
.761
.770
.778
.781
.782
.779
.778
Std. Error of
the Estimate
.390
.382
.374
.367
.360
.354
.351
.351
.353
.353
R Square
Change
.823
.000
.000
.000
.000
.000
-.004
-.005
-.009
-.007
F Change
8.907
.009
.034
.015
.016
.045
.570
.876
1.436
1.049
df1
df2
12
1
1
1
1
1
1
1
1
1
23
23
24
25
26
27
28
29
30
31
Sig. F Change
.000
.924
.855
.903
.901
.834
.456
.357
.240
.314
DurbinWatson
2.425
a. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_One_, Poverty_Wi,
Ratio_Vaca, AVE_FAM_SZ, Ratio_Rent
b. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_One_, Ratio_Vaca,
AVE_FAM_SZ, Ratio_Rent
c. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_One_, Ratio_Vaca,
AVE_FAM_SZ
d. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_Vaca, AVE_FAM_SZ
e. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_Vaca, AVE_FAM_SZ
f. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Vaca, AVE_FAM_SZ
g. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Hisp, Ratio_Vaca, AVE_FAM_SZ
h. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Vaca, AVE_FAM_SZ
i. Predictors: (Constant), Ratio_1950, Ratio_1980, Ratio_Vaca, AVE_FAM_SZ
j. Predictors: (Constant), Ratio_1950, Ratio_Vaca, AVE_FAM_SZ
k. Dependent Variable: Ln(Ratio_pb_10+1)
82
Table 7.14: Coefficients of Evansville Model using backward elimination method
a
Coefficients
Unstandardized Standardized
Coefficients
Coefficients
Model
B
Std. Error
Beta
1
(Constant)
-1.270
.981
Ratio_1950
1.074
.581
.148
Ratio_Vaca
9.408
1.101
.783
AVE_FAM_S .663
.346
.176
t
-1.295
1.849
8.541
1.917
Sig.
.204
.074
.000
.064
Correlations
Zero-order Partial
.134
.870
.548
.311
.834
.321
Part
.147
.680
.153
Collinearity Statistics
Tolerance
VIF
.987
.754
.747
1.013
1.326
1.338
a. Dependent Variable: Ln(Ratio_pb_10+1)
83
84
Figure7.6: Distribution of dependent variable according to each independent variable in
Evansville
85
7.4 Indianapolis Model:
Table 7.15: Test the Normal Distribution of residuals of dependent variable in Indianapolis
Model
Tests of Normality
a
Kolmogorov-Smirnov
Statistic
df
Sig.
Unstandardized Residual
.086
125
.024
Shapiro-Wilk
Statistic
df
.986
125
Sig.
.230
a. Lilliefors Significance Correction
Figure 7.7: Histogram of residual of dependent variable in Indianapolis.
Table 7.16: Result of using the stepwise for selected tracts in Indianapolis
Model Summary f
Change Statistics
Model
1
2
3
4
5
R
.616a
.759b
.787c
.810d
.823e
R Square
.379
.576
.619
.655
.677
Adjusted
R Square
.374
.569
.610
.644
.663
Std. Error of
the Estimate
**********
**********
**********
**********
**********
R Square
Change
.379
.196
.044
.036
.021
F Change
75.139
56.426
13.901
12.606
7.787
df1
df2
1
1
1
1
1
123
122
121
120
119
DurbinWatson
Sig. F Change
.000
.000
.000
.001
.006
2.042
a. Predictors: (Constant), Ratio_1980
b. Predictors: (Constant), Ratio_1980, Poverty_Ra
c. Predictors: (Constant), Ratio_1980, Poverty_Ra, Ratio_1950
d. Predictors: (Constant), Ratio_1980, Poverty_Ra, Ratio_1950, AVE_FAM_SZ
e. Predictors: (Constant), Ratio_1980, Poverty_Ra, Ratio_1950, AVE_FAM_SZ, Ratio_One_
f. Dependent Variable: Ln(Ratio_Pb_10+1)
Table 7.17: Coefficients of Indianapolis Model using stepwise method
Coefficients a
Model
1
(Constant)
Ratio_1980
Poverty_Ra
Ratio_1950
AVE_FAM_SZ
Ratio_One_
Unstandardized
Coefficients
B
Std. Error
-1.080
.710
-1.168
.214
1.879
.338
1.586
.437
1.100
.239
-2.778
.995
Standardized
Coefficients
Beta
-.358
.398
.199
.348
-.234
t
-1.521
-5.466
5.565
3.630
4.593
-2.790
Sig.
.131
.000
.000
.000
.000
.006
Zero-order
-.616
.603
.337
.511
.121
Correlations
Partial
-.448
.454
.316
.388
-.248
Part
-.285
.290
.189
.239
-.145
Collinearity Statistics
Tolerance
VIF
.634
.532
.908
.473
.388
1.577
1.879
1.102
2.112
2.577
a. Dependent Variable: Ln(Ratio_Pb_10+1)
86
Table 7.18: Result of using the backward elimination method for selected tracts in Indianapolis
Model Summaryg
Change Statistics
Model
1
2
3
4
5
6
R
R Square
.842a
.709
b
.842
.709
.842c
.709
.841d
.708
e
.839
.703
.835f
.698
Adjusted
R Square
.678
.681
.683
.685
.683
.679
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
R Square
Change
.709
.000
.000
-.001
-.004
-.006
F Change
22.739
.035
.152
.344
1.739
2.207
df1
12
1
1
1
1
1
df2
112
112
113
114
115
116
Sig. F Change
.000
.852
.697
.559
.190
.140
DurbinWatson
2.056
a. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_Rent, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed,
Poverty_Wi, Ratio_One_, AVE_FAM_SZ
b. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_Rent, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi,
Ratio_One_, AVE_FAM_SZ
c. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi, Ratio_One_,
AVE_FAM_SZ
d. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_FAM_SZ
e. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_FAM_SZ
f. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_H_Ed, Poverty_Wi, AVE_FAM_SZ
g. Dependent Variable: Ln(Ratio_Pb_10+1)
87
Table 7.19: Coefficients of Indianapolis Model using backward elimination method
Coefficientsa
Model
1
(Constant)
Poverty_Ra
AVE_HH_SZ
Ratio_1950
Ratio_1980
Ratio_H_Ed
Poverty_Wi
AVE_FAM_SZ
Unstandardized
Coefficients
B
Std. Error
.867
.890
1.626
.513
-.677
.255
1.319
.440
-1.215
.203
-1.285
.445
-1.953
.968
1.286
.379
Standardized
Coefficients
Beta
.344
-.289
.165
-.372
-.207
-.183
.407
t
.975
3.170
-2.656
3.000
-5.975
-2.886
-2.018
3.392
Sig.
.332
.002
.009
.003
.000
.005
.046
.001
Correlations
Zero-order
Partial
.603
.165
.337
-.616
-.573
.401
.511
.281
-.238
.267
-.484
-.258
-.183
.299
Part
.161
-.135
.153
-.304
-.147
-.103
.172
Collinearity Statistics
Tolerance
VIF
.219
.219
.852
.667
.502
.313
.180
4.563
4.576
1.174
1.500
1.991
3.193
5.568
a. Dependent Variable: Ln(Ratio_Pb_10+1)
88
89
Figure 7.8: Distribution of dependent variable according to each independent variable in
Indianapolis
90
7.5 Elkhart and Goshen Model:
Table 7.20: Test the Normal Distribution of residual of dependent variable in Elkhart and
Goshen Model
Tests of Normality
a
Unstandardized Residual
Kolmogorov-Smirnov
Statistic
df
Sig.
.140
11
.200*
Statistic
.949
Shapiro-Wilk
df
11
Sig.
.632
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.9: Histogram of residual of dependent variable in Elkhart and Goshen.
Table 7.21: Result of using the stepwise for selected tracts in Elkhart and Goshen
b
Model Summary
Model
1
R
R Square
.807a
.651
Adjusted
R Square
.612
Change Statistics
Std. Error of R Square
the Estimate Change F Change
df1
df2
Sig. F Change
**********
.651
16.801
1
9
.003
DurbinWatson
1.616
a. Predictors: (Constant), Ratio_Hisp
b. Dependent Variable: Ln(Ratio_Pb_10+1)
Table 7.22: Coefficients of Elkhart and Goshen Model using stepwise method
Coefficientsa
Model
1
(Constant)
Ratio_Hisp
Unstandardized
Coefficients
B
Std. Error
1.447
.235
4.844
1.182
a. Dependent Variable: Ln(Ratio_Pb_10+1)
Standardized
Coefficients
Beta
.807
t
6.155
4.099
Sig.
.000
.003
Zero-order
.807
Correlations
Partial
.807
Part
.807
Collinearity Statistics
Tolerance
VIF
1.000
1.000
91
7.6 South Bend and Mishawaka Model:
Table 7.23: Test the Normal Distribution of residual of dependent variable in South Bend and
Mishawaka Model
Tests of Normality
a
Unstandardized Residual
Kolmogorov-Smirnov
Statistic
df
Sig.
.111
33
.200*
Statistic
.972
Shapiro-Wilk
df
33
Sig.
.527
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.10: Histogram of residual of dependent variable in South Bend and Mishawaka.
Table 7.24: Result of using the stepwise for selected tracts in South Bend and Mishawaka
Model Summaryc
Change Statistics
Model
1
2
R
R Square
.782a
.612
.830b
.689
Adjusted
R Square
.599
.668
Std. Error of R Square
the Estimate Change
**********
.612
**********
.077
a. Predictors: (Constant), Ratio_Vaca
b. Predictors: (Constant), Ratio_Vaca, AVE_FAM_SZ
c. Dependent Variable: Ln(Ratio_Pb_10+1)
F Change
48.803
7.447
df1
df2
1
1
Sig. F Change
31
.000
30
.011
DurbinWatson
2.411
Table 7.25: Result of using the backward elimination method for selected tracts in South Bend and Mishawaka
Model Summary
l
Change Statistics
Model
1
2
3
4
5
6
7
8
9
10
11
R
.864a
.864b
.864c
.864d
.864e
.864f
.863g
.861h
.856i
.845j
.830k
R Square
.747
.747
.747
.747
.747
.746
.745
.741
.732
.713
.689
Adjusted
R Square
.595
.615
.632
.648
.662
.675
.686
.693
.694
.684
.668
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
**********
**********
**********
**********
**********
R Square
Change
.747
.000
.000
.000
.000
.000
-.001
-.004
-.008
-.019
-.024
F Change
4.924
.001
.005
.014
.028
.046
.137
.405
.884
1.988
2.473
df1
df2
12
1
1
1
1
1
1
1
1
1
1
20
20
21
22
23
24
25
26
27
28
29
Sig. F Change
.001
.977
.945
.907
.869
.832
.715
.530
.355
.170
.127
DurbinWatson
2.411
a. Predictors: (Constant), Ratio_One_, Ratio_1980, Ratio_Rent, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Blac,
Poverty_Ra, AVE_HH_SZ, AVE_FAM_SZ
b. Predictors: (Constant), Ratio_One_, Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Blac, Poverty_Ra, AVE_
HH_SZ, AVE_FAM_SZ
c. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Blac, Poverty_Ra, AVE_HH_SZ,
AVE_FAM_SZ
d. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Poverty_Ra, AVE_HH_SZ, AVE_FAM_SZ
e. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, Poverty_Ra, AVE_HH_SZ, AVE_FAM_SZ
f. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ
g. Predictors: (Constant), Ratio_1980, Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ
h. Predictors: (Constant), Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ
i. Predictors: (Constant), Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ
j. Predictors: (Constant), Ratio_Vaca, AVE_HH_SZ, AVE_FAM_SZ
k. Predictors: (Constant), Ratio_Vaca, AVE_FAM_SZ
l. Dependent Variable: Ln(Ratio_Pb_10+1)
92
Table 7.26: Coefficients of South Bend and Mishawaka using backward elimination method
Coefficientsa
Model
1
(Constant)
AVE_FAM_SZ
Ratio_Vaca
Unstandardized
Standardized
Coefficients
Coefficients
B
Std. Error
Beta
-.373
.682
.661
.242
.368
7.392
1.841
.541
t
-.547
2.729
4.016
Sig.
Zero-order
.589
.011
.722
.000
.782
Correlations
Partial
.446
.591
Part
.278
.409
Collinearity Statistics
Tolerance
VIF
.571
.571
1.751
1.751
a. Dependent Variable: Ln(Ratio_Pb_10+1)
Figure7.11: Distribution of dependent variable according to each independent variable in South Bend and Mishawaka
93
94
6.7 Fort Wayne Model:
Table 7.27: Test the Normal Distribution of residual of dependent variable in Fort Wayne
Tests of Normality
a
Unstandardized Residual
Kolmogorov-Smirnov
Statistic
df
Sig.
.117
33
.200*
Statistic
.958
Shapiro-Wilk
df
33
Sig.
.230
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.12: Histogram of residual of dependent variable in Fort Wayne
Table 7.28: Result of using the stepwise for selected tracts in Fort Wayne
Model Summaryc
Change Statistics
Model
1
2
R
R Square
.685a
.469
.754b
.568
Adjusted
R Square
.451
.540
Std. Error of
the Estimate
**********
**********
R Square
Change
.469
.100
F Change
27.338
6.927
df1
df2
1
1
31
30
Sig. F Change
.000
.013
DurbinWatson
1.566
a. Predictors: (Constant), Ratio_H_Ed
b. Predictors: (Constant), Ratio_H_Ed, Ratio_1950
c. Dependent Variable: Ln(Ratio_Pb_10+1)
Table 7.29: Coefficients of Fort Wayne using backward elimination method
a
Coefficients
Model
1
(Constant)
Ratio_H_Ed
2
(Constant)
Ratio_H_Ed
Ratio_1950
Unstandardized
Standardized
Coefficients
Coefficients
B
Std. Error
Beta
6.645
.808
-5.126
.980
-.685
5.269
.907
-3.981
.998
-.532
3.259
1.238
.351
a. Dependent Variable: Ln(Ratio_Pb_10+1)
t
8.221
-5.229
5.813
-3.988
2.632
Correlations
Sig.
Zero-order Partial
.000
.000
-.685
-.685
.000
.000
-.685
-.589
.013
.583
.433
Part
Collinearity Statistics
Tolerance
VIF
-.685
1.000
1.000
-.478
.316
.810
.810
1.235
1.235
Table 7.30: Result of using the backward elimination method for selected tracts in Fort Wayne
Model Summaryh
Change Statistics
Model
1
2
3
4
5
6
7
R
.920a
.920b
.920c
.920d
.917e
.912f
.908g
R Square
.847
.847
.846
.846
.841
.831
.825
Adjusted
R Square
.755
.766
.777
.785
.788
.784
.785
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
**********
R Square
Change
.847
.000
.000
-.001
-.004
-.010
-.006
F Change
9.202
.008
.013
.121
.651
1.512
.929
df1
df2
12
1
1
1
1
1
1
20
20
21
22
23
24
25
Sig. F Change
.000
.930
.910
.731
.428
.231
.344
DurbinWatson
2.445
a. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Ratio_H_Ed, Ratio_Vaca, Poverty_Wi,
Ratio_One_, AVE_FAM_SZ, AVE_HH_SZ
b. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_
FAM_SZ, AVE_HH_SZ
c. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ,
AVE_HH_SZ
d. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ, AVE_HH_SZ
e. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ, AVE_HH_SZ
f. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ
g. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Poverty_Wi, Ratio_One_, AVE_FAM_SZ
h. Dependent Variable: Ln(Ratio_Pb_10+1)
95
Table 7.31: Coefficients of Fort Wayne Model using backward elimination method
Coefficientsa
Model
1
(Constant)
Poverty_Ra
Ratio_Rent
Ratio_1950
Poverty_Wi
Ratio_One_
AVE_FAM_SZ
Unstandardized
Coefficients
B
Std. Error
-5.224
1.719
3.594
1.171
1.887
.502
5.251
.977
-5.694
1.928
-7.447
1.851
2.244
.567
Standardized
Coefficients
Beta
.855
.449
.565
-.526
-.800
.740
t
-3.040
3.070
3.756
5.375
-2.953
-4.024
3.962
Sig.
.005
.005
.001
.000
.007
.000
.001
Zero-order
Correlations
Partial
.582
.178
.583
.429
.394
.621
.516
.593
.725
-.501
-.619
.614
Part
.252
.308
.441
-.242
-.330
.325
Collinearity Statistics
Tolerance
VIF
.087
.470
.609
.212
.170
.193
11.523
2.125
1.643
4.717
5.867
5.177
a. Dependent Variable: Ln(Ratio_Pb_10+1)
96
97
Figure7.13: Distribution of dependent variable according to each independent variable
in Fort Wayne
98
7.8 Northern Lake County Model:
Table 7.32: Test the Normal Distribution of residual of dependent variable in Northern Lake
County Model
Tests of Normality
a
Unstandardized Residual
Kolmogorov-Smirnov
Statistic
df
Sig.
.133
25
.200*
Statistic
.955
Shapiro-Wilk
df
25
Sig.
.317
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.14: Histogram of residual of dependent variable in Northern Lake County
Table 7.33: Result of using the stepwise for selected tracts in Northern Lake County
b
Model Summary
Model
1
R
R Square
.672a
.452
Change Statistics
Adjusted Std. Error of R Square
R Square the Estimate Change F Change
df1
df2
Sig. F Change
.428
**********
.452
18.987
1
23
.000
DurbinWatson
1.641
a. Predictors: (Constant), Ratio_Vaca
b. Dependent Variable: Ln(Ratio_Pb_10+1)
Table 7.34: Coefficients of Northern Lake County Model using stepwise method
Coefficientsa
Model
1
(Constant)
Ratio_Vaca
Unstandardized
Coefficients
B
Std. Error
1.255
.241
7.484
1.718
a. Dependent Variable: Ln(Ratio_Pb_10+1)
Standardized
Coefficients
Beta
.672
t
5.206
4.357
Sig.
.000
.000
Zero-order
.672
Correlations
Partial
.672
Part
.672
Collinearity Statistics
Tolerance
VIF
1.000
1.000
Table 7.35: Result of using the backward elimination method for selected tracts in Northern Lake County
Model Summaryi
Change Statistics
Model
1
2
3
4
5
6
7
8
R
R Square
.868a
.754
b
.868
.754
c
.868
.753
.867d
.751
e
.865
.749
.863f
.744
g
.853
.727
.846h
.716
Adjusted
R Square
.507
.545
.576
.602
.623
.639
.636
.641
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
**********
**********
R Square
Change
.754
.000
-.001
-.002
-.002
-.005
-.017
-.011
F Change
3.060
.005
.049
.091
.142
.297
1.131
.726
df1
df2
12
1
1
1
1
1
1
1
12
12
13
14
15
16
17
18
Sig. F Change
.032
.946
.828
.767
.711
.593
.302
.405
DurbinWatson
2.359
a. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Rent, Ratio_Vaca, Poverty_Wi, Ratio_H_Ed,
Ratio_Hisp, AVE_FAM_SZ, Ratio_One_
b. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Rent, Ratio_Vaca, Poverty_Wi, Ratio_H_Ed,
Ratio_Hisp, AVE_FAM_SZ
c. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Vaca, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp,
AVE_FAM_SZ
d. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ
e. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ
f. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ
g. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Hisp, AVE_FAM_SZ
h. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Blac, Ratio_Hisp, AVE_FAM_SZ
i. Dependent Variable: Ln(Ratio_Pb_10+1)
99
Table 7.36: Coefficients of Northern Lake County Model using backward elimination method
Coefficientsa
Model
1
(Constant)
Poverty_Ra
Ratio_1980
Ratio_Blac
Ratio_Hisp
AVE_FAM_SZ
Unstandardized
Standardized
Coefficients
Coefficients
B
Std. Error
Beta
.218
1.551
-1.160
.615
-.248
-3.454
.736
-.664
.696
.402
.339
-1.242
.699
-.385
1.361
.408
.483
t
.141
-1.888
-4.693
1.732
-1.778
3.340
Sig.
.890
.074
.000
.100
.091
.003
Correlations
Zero-order
Partial
.023
-.589
.340
-.112
.356
-.397
-.733
.369
-.378
.608
Part
-.231
-.574
.212
-.217
.408
Collinearity Statistics
Tolerance
VIF
.868
.747
.391
.319
.714
1.153
1.339
2.558
3.139
1.400
a. Dependent Variable: Ln(Ratio_Pb_10+1)
100
101
Figure 7.15: Distribution of dependent variable according to each independent variable
in Northern Lake County
102
7.9 Clarksville, New Albany and Jeffersonville Model:
Table 7.37: Test the Normal Distribution of residual of dependent variable in Clarksville, New
Albany and Jeffersonville Model
Tests of Normality
a
Unstandardized Residual
Kolmogorov-Smirnov
Statistic
df
Sig.
.128
26
.200*
Statistic
.976
Shapiro-Wilk
df
26
Sig.
.768
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Figure 7.16: Histogram of residual of dependent variable in Clarksville, New Albany and
Jeffersonville
Table 7.38: Result of using the stepwise for selected tracts in Clarksville, New Albany and Jeffersonville
Model Summary c
Change Statistics
Model
1
2
R
.692a
.771b
R Square
.479
.594
Adjusted
R Square
.457
.559
Std. Error of
the Estimate
**********
**********
R Square
Change
.479
.115
F Change
22.041
6.521
df1
df2
1
1
24
23
DurbinWatson
Sig. F Change
.000
.018
1.935
a. Predictors: (Constant), Ratio_H_Ed
b. Predictors: (Constant), Ratio_H_Ed, Ratio_Vaca
c. Dependent Variable: Ln(Ratio_Pb_10+1)
Table 7.39: Coefficients of Clarksville, New Albany and Jeffersonville Model using stepwise method
Coefficientsa
Model
1
2
(Constant)
Ratio_H_Ed
(Constant)
Ratio_H_Ed
Ratio_Vaca
Unstandardized
Coefficients
B
Std. Error
6.102
1.060
-6.145
1.309
3.769
1.322
-4.215
1.401
12.216
4.784
Standardized
Coefficients
Beta
-.692
-.475
.403
t
5.758
-4.695
2.851
-3.008
2.554
Sig.
.000
.000
.009
.006
.018
Zero-order
Correlations
Partial
Part
Collinearity Statistics
Tolerance
VIF
-.692
-.692
-.692
1.000
1.000
-.692
.659
-.531
.470
-.400
.339
.709
.709
1.410
1.410
a. Dependent Variable: Ln(Ratio_Pb_10+1)
103
Table 7.40: Result of using the backward elimination method for selected tracts in Clarksville, New Albany and Jeffersonville
Model Summary
h
Change Statistics
Model
1
2
3
4
5
6
7
R
.885a
.884b
.884c
.876d
.870e
.868f
.845g
R Square
.783
.781
.781
.767
.757
.754
.715
Adjusted
R Square
.582
.609
.634
.636
.643
.658
.624
Std. Error of
the Estimate
**********
**********
**********
**********
**********
**********
**********
R Square
Change
.783
-.002
.000
-.013
-.010
-.003
-.039
F Change
3.899
.103
.013
.909
.703
.222
2.880
df1
df2
12
1
1
1
1
1
1
13
13
14
15
16
17
18
Sig. F Change
.011
.753
.910
.355
.414
.644
.107
DurbinWatson
2.270
a. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_Hisp, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, Ratio_Rent,
AVE_FAM_SZ, Ratio_One_, Poverty_Wi
b. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_Hisp, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ,
Ratio_One_, Poverty_Wi
c. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_Hisp, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ,
Poverty_Wi
d. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ, Poverty_Wi
e. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ, Poverty_Wi
f. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi
g. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi
h. Dependent Variable: Ln(Ratio_Pb_10+1)
104
Table 7.41: Coefficients of Clarksville, New Albany and Jeffersonville Model using backward elimination method
Coefficientsa
Model
1
(Constant)
Poverty_Ra
AVE_HH_SZ
Ratio_1950
Ratio_Vaca
Ratio_H_Ed
Poverty_Wi
Unstandardized
Coefficients
B
Std. Error
1.011
1.904
-4.734
2.450
1.133
.524
3.051
1.559
15.975
4.878
-4.620
1.715
8.966
4.362
Standardized
Coefficients
Beta
-.950
.327
.287
.527
-.520
.917
t
.531
-1.932
2.162
1.957
3.275
-2.695
2.056
Sig.
.602
.068
.044
.065
.004
.014
.054
Zero-order
.495
-.179
.431
.659
-.692
.450
Correlations
Partial
-.405
.444
.410
.601
-.526
.427
Part
-.237
.265
.240
.401
-.330
.252
Collinearity Statistics
Tolerance
VIF
.062
.658
.697
.580
.403
.076
16.107
1.520
1.434
1.724
2.481
13.234
a. Dependent Variable: Ln(Ratio_Pb_10+1)
105
106
Figure7.17: Distribution of dependent variable according to each independent variable
In Clarksville, New Albany and Jeffersonville
Download