ANALYSIS AND COMPARISON OF BLOOD LEAD RISK AREA MODELS FOR SELECTED URBAN AREAS IN INDIANA A THESIS SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE MASTER OF SCIENCE BY YUNZHONG ZHAO (ADVISOR: DR. KEVIN TURCOTTE) BALL STATE UNIVERSITY MUNCIE, INDIANA JULY 2009 Acknowledgements I would like to express my thanks to my advisor Dr. Turcotte for his help and encouragement with writing this analysis. I also would like to thank committee members Dr. Airriess and Dr. Yang for their guidance to improve this thesis. Thanks to all professors in the Geography Department at Ball State University for their help during my two years in the department. Thanks also to my classmates and friends Bernard, Andrea, Matt, and Michael for providing help and happiness during my study. Last but not least, I would like to thank my parents for their constant love and encouragement throughout my life. This is dedicated to my parents, for their love and support through my life. ii Acknowledgements .................................................................................................................... ii LIST OF TABLES ..................................................................................................................... v LIST OF FIGURES ................................................................................................................viii 1. 2. 3. INTRODUCTION .............................................................................................................. 1 1.1 Overview ............................................................................................................. 1 1.2 Lead Problem in United States ............................................................................ 2 1.3 Progress in Lead Prevention Research ................................................................ 3 1.4 Significance of the Study .................................................................................... 4 1.5 Research Objective ............................................................................................. 4 LITERATURE REVIEW ................................................................................................... 5 2.1 Factors Associated with Lead Risk ..................................................................... 5 2.2 Current Models of Lead Risk .............................................................................. 7 2.2.1. Geographic Resolution ............................................................................... 7 2.2.2. Analysis Methods....................................................................................... 9 2.2.3. Extended Analysis of Current Models ..................................................... 11 DATA, METHODOLOGY, PROCEDURES ................................................................... 12 3.1 Data ................................................................................................................... 12 3.1.1. Study Area ................................................................................................ 12 3.1.2. Description of the Data ............................................................................ 12 3.1.3. Preparation of the Data ............................................................................ 13 3.1.3.1. US Census SF1 and SF3 Data .......................................................... 13 3.1.3.2. Preparation of Children’s BLLs Data ............................................... 16 3.1.3.3. Preparation of GIS Data ................................................................... 16 3.1.4. 3.2 Software ................................................................................................... 17 Methodology ..................................................................................................... 17 3.2.1. Selection of Geographic Resolution ........................................................ 17 3.2.2. Selection of Urban Areas ......................................................................... 17 iii 3.2.3. 4. Model Building ........................................................................................ 21 3.2.3.1. Selection of Independent and Dependent Variables ......................... 21 3.2.3.2. Least Squares Regression ................................................................. 22 3.2.3.3. Testing the Models............................................................................ 23 RESULTS ......................................................................................................................... 24 4.1 Childhood Lead Screening in Indiana through Five Years................................ 24 4.2 Children BLLs in Selected Urban Areas ........................................................... 25 4.3 Independent Variables ....................................................................................... 36 4.4 Test the Model with Different Screened Data ................................................... 37 4.5 Model of Indiana ............................................................................................... 38 4.6 Models of Selected Urban Areas ....................................................................... 40 4.6.1. Muncie ..................................................................................................... 40 4.6.2. Evansville ................................................................................................. 43 4.6.3. Indianapolis .............................................................................................. 45 4.6.4. Elkhart and Goshen .................................................................................. 47 4.6.5. South Bend and Mishawaka ..................................................................... 50 4.6.6. Fort Wayne ............................................................................................... 52 4.6.7. Northern Lake County ............................................................................. 54 4.6.8. Clarksville, New Albany and Jeffersonville............................................. 57 4.6.9. Comparison of Models Based on City Size ............................................. 59 4.6.10. Comparison of Models Based on Location .............................................. 61 4.6.11. Comparison of Models Based on Accuracy ............................................. 62 5. SUMMARY AND DISCUSSION .................................................................................... 66 6. REFERENCES ................................................................................................................. 70 APPENDIX A .......................................................................................................................... 72 iv LIST OF TABLES Table Page Table 3.1 Calculation the Socio-Economic Information …………………..………………...14 Table 3.2 Population and location of urban areas in Indiana………………………………...18 Table 4.1 Comparison of Model Parameters for Large Urban Areas………………………...63 Table 4.2 Comparison of Model Parameters for Small Urban Areas………………………...63 Table 4.3 Comparison of Model Parameters for Northern Indiana…………………………..64 Table 4.4 Comparison of Model Parameters for Central Indiana…………………………….64 Table 4.5 Comparison of Model Parameters for Southern Indiana…………………………..64 Table 4.6 Comparison of Models of different urban areas with the Model of Indiana……………………………………………………………………………...65 Table 7.1 Test the Normal Distribution of residuals of dependent variable in State level Model………………………………..................................................72 Table 7.2 Result of using the stepwise for selected tracts in Indiana……………...................73 Table 7.3 Result of using the backward elimination method for selected tracts in Indiana…………………………………………………………………….74 Table 7.4 Coefficient of Indiana Mode……………………………………………………….75 Table 7.5 Test the Normal Distribution of residual of dependent variable in Muncie Model……………………………………………………………….... .77 Table 7.6 Result of using the stepwise method for selected tracts in Muncie ……………….77 Table 7.7 Coefficients of Muncie Model using stepwise method…………………………….77 Table 7.8 Result of using the backward elimination method for selected tracts in Muncie …78 Table 7.9 Coefficients of Muncie Model using the backward elimination method…………..79 Table 7.10 Test the Normal Distribution of residual of dependent variable in Evansville Model………………………………………………………………81 v Table 7.11 Result of using the stepwise for selected tracts in Evansville…………………….81 Table 7.12 Coefficients of Evansville Model using stepwise method ……………………….81 Table 7.13 Result of using the backward elimination method for selected tracts in Evansville………………………………………………………………………82 Table 7.14 Coefficients of Evansville Model using backward elimination method………….83 Table 7.15 Test the Normal Distribution of residuals of dependent variable in Indianapolis Model…………………………………………………………….85 Table 7.16 Result of using the stepwise for selected tracts in Indianapolis………………….86 Table 7.17 Coefficients of Indianapolis Model using stepwise method……………………..86 Table 7.18 Result of using the backward elimination method for selected tracts in Indianapolis………………………………………………………………….. 87 Table 7.19 Coefficients of Indianapolis Model using backward elimination method……….88 Table 7.20 Test the Normal Distribution of residual of dependent variable in Elkhart and Goshen Model…………………………………………………….90 Table 7.21 Result of using the stepwise for selected tracts in Elkhart and Goshen………….90 Table 7.22 Coefficients of Elkhart and Goshen Model using stepwise method……………...90 Table 7.23 Test the Normal Distribution of residual of dependent variable in South Bend and Mishawaka Model…………………………………………....91 Table 7.24 Result of using the stepwise for selected tracts in South Bend and Mishawaka…………………………………………………...91 Table 7.25 Result of using the backward elimination method for selected tracts in South Bend and Mishawaka…………………………………………………...92 Table 7.26 Coefficients of South Bend and Mishawaka using backward elimination method…………………………………………………….93 Table 7.27 Test the Normal Distribution of residual of dependent variable in Fort Wayne……………………………………………………………………..94 Table 7.28 Result of using the stepwise for selected tracts in Fort Wayne…………………...94 vi Table 7.29 Coefficients of Fort Wayne using backward elimination method………………...94 Table 7.30 Result of using the backward elimination method for selected tracts in Fort Wayne……………………………………………………………………..95 Table 7.31 Coefficients of Fort Wayne Model using backward elimination method………...96 Table 7.32 Test the Normal Distribution of residual of dependent variable in Northern Lake County Model………………………………………………… 98 Table 7.33 Result of using the stepwise for selected tracts in Northern Lake County……….98 Table 7.34 Coefficients of Northern Lake County Model using stepwise method…………...98 Table 7.35 Result of using the backward elimination method for selected tracts in Northern Lake County…………………………………………………………99 Table 7.36 Coefficients of Northern Lake County Model using backward elimination method…………………………………………………...100 Table 7.37 Test the Normal Distribution of residual of dependent variable in Clarksville, New Albany and Jeffersonville Model…………………………102 Table 7.38 Result of using the stepwise for selected tracts in Clarksville, New Albany and Jeffersonville………………………………….103 Table 7.39 Coefficients of Clarksville, New Albany and Jeffersonville Model using stepwise method…………………………………………………………..103 Table 7.40 Result of using the backward elimination method for selected tracts in Clarksville, New Albany and Jeffersonville………………………………...104 Table 7.41 Coefficients of Clarksville, New Albany and Jeffersonville Model using backward elimination method…………………………………………….105 vii LIST OF FIGURES Figures Page Figure 3.1 Urban Areas in Indiana……………………………………………………………13 Figure 3.2 Preparation of US Census SF1 and SF3 data……………………………………..15 Figure 3.3 Preparation of Children’s Blood Lead Levels (BLLs) data……………………….16 Figure 3.4 Procedure for selecting urban areas……………………………………………….19 Figure 3.5 Selected urban areas in Indiana…………………………………………………...20 Figure 3.6 Least squares regression Line……………………………………………………..22 Figure 4.1 Number of Screened and EBLLs Children in Indiana from 1998 to 2002……….24 Figure 4.2 Percentage of Children Screened with EBLLs in Indiana from 1998 to 2002 in Indiana……………………………………………………..25 Figure 4.3 Percentages of Children with EBLLs in Muncie from 1998 to 2002 by census tract…………………………………………………………………….26 Figure 4.4 Percentages of Children with EBLLs in Evansville from 1998 to 2002 by census tract…………………………………………………………………….27 Figure 4.5 Percentages of Children with EBLLs in Indianapolis from 1998 to 2002 by census tract…………………………………………………………………….28 Figure 4.6 Percentages of Children with EBLLs in Elkhart and Goshen from 1998 to 2002 by census tract…………………………………………………………………….29 Figure 4.7 Percentages of Children with EBLLs in South Bend and Mishawaka from 1998 to 2002 by census tract………………………………………………..30 Figure 4.8 Percentages of Children with EBLLs in Fort Wayne from 1998 to 2002 by census tract…………………………………………………………………….31 Figure 4.9 Percentages of Children with EBLLs in Northern Lake County in from 1998 to 2002 by census tract…………………………………………………32 Figure 4.10 Percentages of Children with EBLLs in Clarksville, New Albany, viii and Jeffersonville from 1998 to 2002 by census tract…………………………..33 Figure 4.11 Number of Screened and EBLLs Children in Muncie from 1998 to 2002………34 Figure 4.12 Number of Screened and EBLLs Children in Evansville from 1998 to 2002…...34 Figure 4.13 Number of Screened and EBLLs Children in Indianapolis from 1998 to 2002…34 Figure 4.14 Number of Screened and EBLLs Children in Elkhart, Goshen from 1998 to 2002…………………………………………………………….…35 Figure 4.15 Number of Screened and EBLLs Children in South Bend and Mishawaka from 1998 to 2002………………………………………………….35 Figure 4.16 Number of Screened and EBLLs Children in Fort Wayne from 1998 to 2002….35 Figure 4.17 Number of Screened and EBLLs Children in Northern Lake County from 1998 to 2002……………………………………………………………….36 Figure 4.18: Number of Screened and EBLLs Children in Clarksville, New Albany and Jeffersonville from 1998 to 2002……………………………..36 Figure4.19 Selected Census Tracts in Indiana………………………………………………..39 Figure 4.20 Test Homoscedasticity of State Model…………………………………………..39 Figure 4.21 Residuals by Census Tract Selected Areas in Indiana…………………………...40 Figure4.22 Selected Census Tracts in Muncie………………………………………………..41 Figure 4.23 Test Homoscedasticity of Muncie Model………………………………………..41 Figure 4.24 Residuals by Census Tract Selected Areas in Muncie…………………………...42 Figure4.25 Selected Census Tracts in Evansville…………………………………………….43 Figure 4.26 Test Homoscedasticity of Evansville Model…………………………………….44 Figure 4.27 Residuals by Census Tract Selected Areas in Evansville………………………..44 Figure 4.28 Selected Census Tracts in Indianapolis………………………………………….46 Figure 4.29 Test Homoscedasticity of Indianapolis Model…………………………………..46 Figure 4.30 Residuals by Census Tract Selected Areas in Indianapolis……………………...47 Figure 4.31 Selected Census Tracts in Elkhart and Goshen………………………………….48 Figure 4.32 Test Homoscedasticity of Elkhart and Goshen Model…………………………..48 ix Figure 4.33 Residuals by Census Tract Selected Areas in Elkhart and Goshen……………...49 Figure 4.34 Selected Census Tracts in South Bend and Mishawaka………………………....50 Figure 4.35 Test Homoscedasticity of South Bend and Mishawaka Model………………….51 Figure 4.36 Residuals by Census Tract Selected Areas in South Bend and Mishawaka……..51 Figure 4.37 Selected Census Tracts in Fort Wayne…………………………………………..53 Figure 4.38 Test Homoscedasticity of Fort Wayne Model……………………………………53 Figure 4.39 Residuals by Census Tract Selected Areas in Fort Wayne ……………………...54 Figure 4.40 Selected Census Tracts in Northern Lake County……………………………….55 Figure 4.41 Test Homoscedasticity of Northern Lake County Model………………………..56 Figure 4.42 Residuals by Census Tract Selected Areas in Northern Lake County…………...56 Figure 4.43 Selected Census Tracts in Clarksville, New Albany and Jeffersonville…………58 Figure 4.44 Test Homoscedasticity of Clarksville, New Albany and Jeffersonville Model….58 Figure 4.45 Residuals by Census Tract Selected Areas in Clarksville, New Albany and Jeffersonville………………………………………………….59 Figure 7.1 Histogram of residual of dependent variable in Indiana State…………………...72 Figure 7.2 Distribution of dependent variable to each independent variable in Indiana. …...76 Figure 7.3 Histogram of dependent variable in Muncie…………………………………......77 Figure 7.4 Distribution of dependent variable according to each independent variable in Muncie…………………………………………………………………………....80 Figure 7.5 Histogram of residual of dependent variable in Evansville……………..……….81 Figure 7.6 Distribution of dependent variable according to each independent variable in Evansville………………………..………………………………………..………84 Figure 7.7 Histogram of residual of dependent variable in Indianapolis……………….…….85 Figure 7.8 Distribution of dependent variable according to each independent variable in Indianapolis………………………………………………………………..…...…89 Figure 7.9 Histogram of residual of dependent variable in Elkhart and Goshen…………….90 Figure 7.10 Histogram of residual of dependent variable in South Bend and Mishawaka…..91 x Figure7.11 Distribution of dependent variable according to each independent variable in South Bend and Mishawaka……………………………………..………………93 Figure 7.12 Histogram of residual of dependent variable in Fort Wayne…………………….94 Figure7.13 Distribution of dependent variable according to each independent variable in Fort Wayne……………………………………………………..…………….….97 Figure 7.14 Histogram of residual of dependent variable in Northern Lake County………...98 Figure 7.15 Distribution of dependent variable according to each independent variable in Northern Lake County…………………………………….……….…………..101 Figure 7.16 Histogram of residual of dependent variable in Clarksville, New Albany and Jeffersonville………………………………………………...102 Figure7.17 Distribution of dependent variable according to each independent variable in Clarksville, New Albany and Jeffersonville……………………….…………..106 xi 1. INTRODUCTION 1.1 Overview Exposure to lead is significant to human because it will damage the central nervous system and impair learning and behavior even at low levels (Canfields et al. 2003). Lead is contained in many items that relate to human life. For example, lead may exist in older houses that contain layers of lead paint. Soils could be another source of lead as a result of historical deposition from automobile exhaust, lead arsenate pesticide and industrial or incinerator emission (CDC, 1997). Lead may also come from ceramic ware or through occupational hazards such as lead mining. Young children are more vulnerable to lead than adults for two reasons. First, they may ingest contaminated dust and soil as a result of normal mouthing activity. Second, they take in more lead as a proportion of body mass and absorb more lead than do adults (Mushak, 1992). The excessive absorption of lead may cause many problems for young children, such as learning and behavioral disorders, decreased mental ability and intelligence quotient (IQ), hearing impairment, delayed development, decreased attention span, delinquency and criminal behavior and other nervous system problems (Oyana and Margai, 2007). Lead even causes coma, convulsion, and death in children with very high levels (Rappazzo et al., 2007). 2 1.2 Lead Problem in United States In the United States, the cost of health effects associated with lead exposure is estimated to be $43.4 billion each year, which is much more than costs of other childhood diseases of environmental origin (Landrigan et al., 2002). According to the 2003-2004 National Health and Nutrition Examination Survey (NHANES), the Blood Lead Levels (BLLs) at or above the Centers for Disease Control and Prevention (CDC) blood action level of 10 ug/dl for children between one and five years old in the United States is 2.3%, which is greater than 500,000 children (Kim et al. , 2008). The federal government’s response to the lead problem was to lower the recommendation screening BLLs from 60ug/dl in the 1960s to 40 ug/dl in 1971, 30 ug/dl in 1978, 25 ug/dl in 1985, and finally to 10 ug/dl in 1991 (Oyana and Margai, 2007). In addition, the 1978 enactment of the Lead-Based Paint Poisoning Prevention Act banned the use of lead-based paint nationally. However, there are still a large number of houses built before 1978 and many of them are even built before 1950. Lead would be present when paint deteriorates or spreads during renovation. Historical deposition from automobile exhaust and factories that emit lead would still affect the children living in adjacent areas. According to the Indiana State Department of Health (2004), the most common source of lead exposure for children was related to lead-based paint in older houses, especially in houses built prior to 1950. Indiana has 71,711 houses built prior to 1950, ranking 11th in the nation. Indiana’s proportion of housing built prior to 1950 is 28.3%, which is higher than the national average of 22.3%. The children who live in these old houses have a high possibility of being effected by the lead in and around these houses. 3 1.3 Progress in Lead Prevention Research In order to reduce the elevated BLLs in children, the CDC provided grants to state and local agencies for screening children for blood lead levels. For example, in 2003, the CDC provided $31.7 million to 42 states and local health departments to develop and implement comprehensive lead poisoning prevention efforts. Margai (1998) claimed that even though the CDC recommended universal screening, not all children were being tested. As a result, both neighborhoods and geographic clusters of highly exposed individuals would not be screened well. Margai also pointed out the necessity to build an efficient lead monitoring and prevention plan. GIS can be used to map and evaluate childhood lead risk areas. According to recent studies in New York and North Carolina, the level of lead risk was linked with the age of housing, house value, percentage of renter occupied houses, percentage of children in poverty, percentage of one-parent households, household median income, percentage of African American, and percentage of Hispanic. Several statistical models have been built based on these variables. In addition, some research focused on how to map lead exposure risk areas based on different resolution levels. For example, some researchers used census block, census tract, or tax parcel as a basic resolution to create GIS models for directing childhood lead poisoning prevention. However, among these studies, only a few focused on the geographic factors in these models. 4 1.4 Significance of the Study In this study, the key is to determine which parameters based on selected urban areas, are the best predictors of elevated BLLs (blood lead levels) in children. Differences in model parameters will be examined for different city size and geographical location. In order to compare these models, it is necessary to find a way to build the models based on the same procedure and standard. 1.5 Research Objective The primary purpose of this study is to identify the effects of size and location of different urban areas on the different parameters in the statistical models. The specific objectives of this thesis are: 1) To describe the elevated BLL changes in Indiana from 1998 to 2002. 2) To standardize a method to create statistical models to predict census tracts with high percentages of elevated BLLs. 3) To compare and examine the models from urban areas of different size and location based on the parameters of the generated models. 4) To compare the urban areas generated models to a state level model. 5) To create residual maps based on the difference between the value generated by models and the actual screened values. 2. LITERATURE REVIEW 2.1 Factors Associated with Lead Risk Cromley and McLafferty (2002, p. 9) claimed that “if the cause of the disease is believed to be environmental, then we would expect disease risk to be higher in those geographical areas where environmental risk is higher.” The level of lead risk has been linked with socio-economic factors and age of housing stock. Talbot et al. (1998) linked high prevalence of elevated BLLs with areas of older housing stock, a smaller proportion of high school graduates, and a larger proportion of black births in their research of the childhood blood lead levels in New York State. An extended research of blood lead levels in New York State children born between 1994 and 1997 found the same relationship (Haley and Talbot, 2004). Dwyer (1998) used the age of housing stock, land use and road distance to evaluate risk areas in Australia. Bruenling et al. (1999) tested the dietary calcium intake of urban children and claimed that both lead exposure and low dietary calcium pose significant health risks to urban minority children. Because the demolition of old houses could be a source of lead, researchers studied dust caused by demolition. Farfel et al. (2003) studied the dust-fall samples collected from fixed locations within ten meters of three demolition sites to describe lead dust changes in the surrounding environment. In addition, lead dust would also spread through remedial or removal activities at superfund sites. Khoury and 6 Diamond (2003) studied modeling approaches for assessing potential risk to children from air lead emissions from remedial or removal activities at superfund sites in West Dallas, Texas. They used the Environmental Protection Agency (EPA) Integrated Exposure Uptake Biokinetic (IEUBK) model and the International Commission of Radiologic Protection (ICRP) lead model to simulate blood lead concentrations in children. These studies examined the source of lead in demolition and superfund sites and revealed the variables that affected the residences surrounding these sites. Other researchers have studied the seasonality of children’s blood lead levels. Laidlaw et al. (2005) explored the temporal relationship between children’s blood lead levels with weather, soil moisture and dust in Indianapolis, Indiana, Syracuse, New York, and New Orleans, Louisiana. In this research, the average children’s blood Pb (BPb) concentration in each city were computed using the children’s BPb measurements for each month as a variable regressed against the independent variables of average monthly soil moisture, particulate matter < 10pm in diameter, wind speed and temperature. The results showed that the seasonal resuspension of Pb-contaminated soil in urban atmospheres is controlled by soil moisture and climate fluctuations. Higher urban atmospheric Pb loading rates are present during the period of low soil moisture and within areas of Pb-contaminated surface soils. Researchers have also focused on the changes in children’s blood lead levels related to compliant housing. Rappazzo et al. (2007) found that blood lead level changes were not significantly different between children in compliant housing and those living in noncompliant housing for periods of 1.5 to 2 years, 2 to 3 years, or more than 3 years in Philadelphia. This study also pointed out that many factors might influence blood lead levels, including the age of a subject, gender, season, the time of the test, diet of the subject immediately before the test, and the possible presence of 7 lead on a subject’ skin. The blood lead results are not a reflection of total body burden of lead considering lead’s half-life of the 30 to 60 days in the blood. 2.2 Current Models of Lead Risk For the reason that screening databases can be highly biased in representing the geographical distribution of a health problems (Cromley and McLafferty, 2002), choosing screening areas becomes important in building models of lead risk to children. Some factors that affect results include the geographic resolution and the choice of the regression methods. 2.2.1. Geographic Resolution Geographic resolution can affect the model results in different ways. Some geographic resolutions such as census blocks or tax parcels could increase the geographic accuracy but affect the significance of the model; large geographic resolutions such as census tract or zip code areas are crude for geographic accuracy, but contribute to the significance of the model. Many researchers used different geographic resolutions such as zip code areas, census tracts, or tax parcels. There is no standard to judge which resolution is the best because of different methods of data collection, procedures, and accessibility. Zip code areas or merged zip code regions are chosen in many studies for the reason that most data from health departments are geocoded at the zip code level. 8 Talbot et al. (1998) used merged zip code regions as the units of observation in their research of children blood lead levels in New York. Haley and Talbot (2004) extended their research again using merged zip code regions. Using zip code areas or merged zip code regions could minimize the error created through the transformation to other resolutions such as census blocks or census tracts, the boundaries of which are not completely matched with the boundaries of zip code areas. However, compared with the zip code areas, census tracts would be more sensitive in pinpointing a greater number of older housing units for the reason that socio-economic factors are more similar within a census tract than within the larger zip code area. Reissman et al. (2001) achieved these conclusions in their analysis of childhood BPb levels and residential locations of at-risk children screened from 1996 through 1997, the number and location of homes where more than one child had been poisoned by lead from 1994 through 1998 in Jefferson County, Kentucky. Griffith et al. (1998) made a comparison between census blocks, census block groups, and census tracts in research of childhood blood lead levels in Syracuse, New York. The results indicated that census tracts and census block groups appear as suitable resolutions to build sound, model-based statistical inferences. However, the census block level of aggregation is too sparse to achieve satisfactory statistical models. The tax parcel unit is also used as the geographic resolution in some studies. Miranda et al. (2002) used tax parcels to study the potential lead risk for children in North Carolina. Kim et al. (2008) built three childhood lead exposure risk models based on tax parcels. They claimed that “the highly resolved models allow communities to target the highest-risk homes more cost-effectively and to create and implement targeted intervention programs”(p.1735). 9 Increased resolution could be achieved at the point level which locates each of the screening locations. An example is the Robert et al. (2003) study of old housing and lead screening in Charleston County, South Carolina. They found that children living in pre-1950 housing were at higher risk for lead poisoning and a large number of cases in an area of newer houses, but near a potential point source of lead. 2.2.2. Analysis Methods In recent studies of children with elevated BLLs, different regression methods were used for building lead risk assessment models. One of the regression methods is least squares regression. Talbot et al. (1998) used this method to examine children’s BLLs and community characteristics. They used percent of houses built before 1940, percent of houses built before 1950, percent of houses vacant, adults age 25 and older who graduated from high school, percent of children under 5 years living below the poverty level, percent of Hispanic, percent of Black, percent of population that rents a home, and population density as variables. The log-transformed percentage of children with elevated BLLs in each zip code was chosen as the dependent variable. In addition, all zip codes with fewer than 100 children tested were merged to create new zip code regions. The analysis process included shaping and effecting the bivariate associations of each variable with the dependent variable using diagnostic methods to detect the model errors. Haley and Talbot (2004) also extended the research by building a simultaneous autoregressive model. Another example is the research of childhood lead poisoning in North Carolina. Miranda, et al. (2002) used log-linear regression to generate models for six counties in North Carolina. They used 10 ANOVA to drop three variables, thereby using a total of six predictors. The dropped variables were: percentage of children in poverty, percentage of one-parent households, and percentage of renter-occupied housing. Logistic regression methods are also used in building models for evaluating childhood blood lead risk areas. Oyana and Margai (2007) used logistic regression models to approximate the risks of childhood lead poisoning in six neighborhoods in Chicago. The Northside neighborhood of Chicago was used as a reference area in the procedure and the result showed that the Westside neighborhoods faced the greatest risk of lead poisoning. Researchers used not only statistical analysis, but spatial and geostatistical methods have also been employed. An example is Margai (1998), who used GIS to generate lead case clusters, buffers of environmental sources, including factories and other facilities related with lead, buffers for automobile-related facilities such as gas storage and buffers of environmental pathways such as rail corridors in Binghamton, New York. Spatial analysis found that nearly six out of ten lead case clusters were within these buffers. In addition, a combination of geographic factors and demographic variables were made using canonical coefficients to generate high-risk areas for lead poisoning. Oyana and Margai (2007) used kriging to predict unknown values from observed data at known locations using a fitted semivariogram model. The kriged maps provided a smooth surface of locations with high prevalence rates and clearly captured the overall pattern of decline spatially. Haley and Talbot (2004) used a spatial error model to test four different weight matrices: first-order neighbors, second-order neighbors, inverse distance to 25 km, and inverse distance squared to 25 km. 11 2.2.3. Extended Analysis of Current Models Current researchers have created models that focused on areas such as an entire state, certain counties or cities. For example, Talbot et al. (1998) generated a model based on New York State, Miranda et al. (2002) generated models base on six selected counties in North Carolina, and Griffith et al. (1998) generated models based on Syracuse, New York. No researchers linked the city or county models with a state model or related models created from different urban areas. Based on this situation, an extended analysis of current models would test if a state level model could be suitable for all the urban areas or if a generated model for one urban area could be used for another. And another extension would lie in examining the differences among the models based on the different geographic locations and size of urban areas. 3. DATA, METHODOLOGY, PROCEDURES 3.1 Data 3.1.1. Study Area The primary study area of this thesis is in Indiana. Indiana is located in the northwest region. It borders Illinois in the west, Ohio in the east, Kentucky in the south, and Michigan in the North. There are thirteen urban areas in Indiana (Figure 3.1).According to the 2000 U.S. Census, the population of Indiana is 6,080,485 and the number of children under six-year-old is 595,896, which is about 9.8 percent of the total population. 3.1.2. Description of the Data The data for this thesis is composed of three parts: the first part are the socio-economic data that is sourced from the 2000 U.S. Census Summary File 1 (SF1, 100-Percent Data), and Summary File 3(SF3), which is based on sample data. The second part is children’s BLLs information from 1998 to 2002. These data are from the Indiana State Department of Health and are aggregated by census tracts. The third part of the data is a digital map of Indiana, which comes from the Indiana Geological 13 Survey. Figure 3.1: Urban Areas in Indiana 3.1.3. Preparation of the Data 3.1.3.1. US Census SF1 and SF3 Data US census data SF1 contains most of the population information, but lacks details such as the age of buildings; however, US census data SF3 provides this information. Social, economic and housing information was extracted from both SF1 and SF3 using the software DataFerrett with 2000 US Census data. Both SF1 and SF3 possesses census tract data. The census tract number in SF1 was used as a foreign key, and census tract numbers from SF3 was used as the primary key. The function of Join Table was used to link these two tables together in ArcGIS and a new table that contained information from both was generated. Based on the new 14 table, some parameters were calculated. Table 3.1 and Figure 3.2 show this procedure. Table 3.1 Calculation the Socio-Economic Information Dividend Divisor Results Total Black Population Total Population The Ratio of Black Total Hispanic Population Total Population The Ratio of Hispanic Rental Housing Total Housing The Ratio of Rental Housing Vacant Housing Total Housing The Ratio of Vacant Housing Total Population of High School Total Housing The Ratio of High School or above or above Education Housing Built before 1950 Education Total Housing The Ratio of Housing Built before 1950 Housing Built before 1980 Total Housing The Ratio of Housing Built before 1980 One Parent Families Total Families The Ratio of One Parent Family Families in Poverty having Total Families The Ratio of Children under Five Children under Five-Years-Old Families in Poverty having Children under Eighteen-Years-Old Years Living below the Poverty Level Total Families The Ratio of Children under Eighteen Years Living below the Poverty Level 15 Ratio of Black Total Black Ratio of Hispanic US Census SF1 Total Hispanic Total Population US Census SF3 Ratio of Rental Housing Ratio of Vacant Housing Total Housing Ratio of High School or above Education Total Families Ratio of Housing built before 1950 Total Families in Poverty Ratio of Housing built before 1980 Rental Housing Ratio of One Parent Family Vacant Housing Age of Housing Ratio of Children under Five Years Living below the Poverty Level Poverty Ratio of Children under Eighteen Years Living below the Poverty Level Education Average Household Size One Parent Family Average Family Size Average Household Size Average Family Size Figure 3.2: Preparation of US Census SF1 and SF3 Data 16 3.1.3.2. Preparation of Children’s BLLs Data The children’s BLLs data for Indiana is collected by the Indiana State Department of Health. The data includes screening information from 1998 to 2002 for each census tract. The geocoded procedure includes two steps: first, the address record is geocoded to an interpolated street range, and second, if the address record is not matched to a street, then it is mapped to a zip code centroid. For each record, it contains the number of screened children and the number of children with BLLs of 10µg/dl or higher for each tract between 1998 and 2002. Figure 3.3 illustrates the procedure for children’s BLLs data. US Census SF3 Children BLLs Information Total Children under Six Percentage of Screened Num of Screened Children Num of BLLs >=10ug/dl Percentage of elevated blood lead levels (EBLLs) Figure 3.3: Preparation of Children’s Blood Lead Levels (BLLs) data 3.1.3.3. Preparation of GIS Data The GIS data was obtained from the Indiana Geological Survey. The data included US census tracts and urban areas in Indiana. A personal GeoDatabase was created using ArcCatalog. GIS data, related social, economic, and housing information, and Children’s BLLs were imported into the GeoDatabase. 17 3.1.4. Software The software used in this research includes ArcGIS9.3, SPSS, DataFerrett, Microsoft Excel and Access. DataFerrett was used for searching and extracting 2000 US Census data. ArcGIS9.3 was used to prepare the GeoDatabase and create maps. Microsoft Excel was used to create charts. Microsoft Access was used to extract field of parameters for analysis. SPSS was used to generate and test the regression models. 3.2 Methodology 3.2.1. Selection of Geographic Resolution Previous researchers have used the geographic resolution of zip code areas, census tracts, and tax parcels. In this research, the census tract is chosen as the geographic resolution for this research because socio-economic factors are more similar within a census tract than within the larger area of a zip code (Reissman, 2001), and the purpose of this research is to build models by linking the socio-economic factors and age of housing stock with children’s elevated BLLs, 3.2.2. Selection of Urban Areas According to the 2000 US Census, there are thirteen major urban areas in Indiana. Selection of the study areas is based on three factors: population in urban areas, location of urban areas and number of children screened in urban areas. The population and locations are listed in Table 3.2. 18 Table 3.2: Population and location of urban areas in Indiana Name Location Population Anderson Central 59,734 Bloomington Central 69,291 Clarksville-New Albany-Jeffersonville Southern 113,588 Elkhart-Goshen Northern 81,257 Evansville Southern 121,582 Fort Wayne Northern 244,296 Indianapolis Central Kokomo Central Lafayette-West Lafayette Central Muncie Central South Bend-Mishawaka Northern Northern Lake County Northern 468,335 Terre Haute Central 59,614 791,926 46,113 85,175 67,430 219,361 Because the number of children screened was not uniform, the ratio of children screened was calculated by using the number of children screened divided by the number of children five years old and under for each census tract in each year. All the census tracts with the ratio of children screened less than two percent in any of the five years were filtered. For example, if there were one hundred children under five years old in a census tract, and the number of children screened in 2000 was one, the census tract would be deleted for this study. In order to provide sufficient data for a census tract with zero EBLL children in all five years, a higher filter was set for the ratio of children screened. Only those census tracts with the ratio of children 19 screened equal or above five percent were selected for analysis. That is, if there were one hundred children, the number of children screened was four in 1999 and the number of EBLL children in each year was zero, the census tract would be deleted for this study. The procedure for selecting urban areas is shown in Figure 3.4. Census Tract Urban Areas Select by Location Urban Tract Areas Screen Ratio Table Joined and Filter ratio of children screened < 2% Areas Filter the areas with the number EBLLs children (pb >=10ug/dL) equal to 0 through five years and ratio of children screened < 5% Preselected Urban Areas Figure 3.4: Procedure for selecting urban areas. As a result, eight urban areas are selected from Indiana. The selected urban areas include South Bend and Mishawaka, Northern Lake County, Elkhart and Goshen, Fort Wayne, Indianapolis, Muncie, Evansville, and Clarksville, New Albany and Jeffersonville. These urban areas are shown in Figure3.5. 20 Figure 3.5 Selected urban areas in Indiana 21 3.2.3. Model Building In this research, stepwise and backward elimination methods are used to choose the independent variables. Least squares regression methods are used to build evaluation models for selected cities in Indiana. 3.2.3.1. Selection of Independent and Dependent Variables The selection of dependent variables involves considering whether the distribution of the variable is normal (Talbot et al., 1998, Miranda et al., 2002, Haley and Talbot, 2004). In this research, the dependent variable was chosen from the log-transformed percentage of children with BLLs >=10 ug/dL in each census tract plus one. The independent variables were chosen from Ratio of Black, Ratio of Hispanic, Ratio of Rental Housing, Ratio of Vacant Housing, Ratio of High School or above Education, Ratio of Housing Built before 1950, Ratio of Housing Built before 1980, Ratio of One Parent Family, Ratio of Children under Five Years Living below the Poverty Level, Ratio of Children under Eighteen Years Living below the Poverty Level, Average Household Size, and Average Family Size. For each urban area, all of the twelve independent variables were considered and the most suitable independent variables were selected to build the model for the different urban areas. When selecting the independent variables, the following criteria were used: 1. The correlation between the independent variables and the dependent variable. 2. The exploratory power of the independent variables (Miranda et al., 2002). 3. The significant interactive effects among independent variables (Miranda et al., 2002). 22 3.2.3.2. Least Squares Regression Figure 3.6: Least squares regression Line. The least-squares regression line is a line that minimizes the sum of squared vertical distances between each data point and the (Figure 3.6). When constructing a least squares regression, four criteria need to be considered (JChapman and Charles, 2000): 1. The variables are assumed to have a linear relationship. 2. For every value of the independent variable, the distribution of residuals or error values should be normal, and the mean of the residuals should be zero. 3. For every value of the independent variable, the variance of residual error is assumed to be equal. 4. The value of each residual is independent of all other residual values. 23 3.2.3.3. Testing the Models There are two methods to test the least squares regression models. The first method is using statistical criteria, and the second method is drawing a residuals map to display the error patterns. In this research, both of the methods are used. The statistical criteria include the coefficient of multiple determination (R2), which is used to interpret the percentage of variation explained, and homoscedasticity, which is used to describe whether variance of the residuals is homogeneous across levels of the predicted values. A value is calculated for the coefficient of multiple determination (R2). In addition, when using the multiple determination (R2), an F test (or F statistic) is used to evaluate the significance of R2. If the probability of the F value is sufficiently small (has a small p value), one can conclude that the independent variable accounts for a significant amount of the total variation in the dependent variable (McGrew and Monroe, 2000). For homoscedasticity, a chart of regression standardized residuals and regression standardized predicted value is created for the test. 4. RESULTS Results of the analysis, which include the maps, charts and models, are presented in this chapter. More information for the model results is listed in Appendix A. The information includes the test of normal distribution of dependent variables for each model, R2 for the model, and the distribution of dependent variables to each independent variable. 4.1 Childhood Lead Screening in Indiana through Five Years The number of children screened, number of children with BLLs above 10 ug/dL, number of children with BLLs above 20 ug/dL, and the percentages from 1998 to 2002 are shown in Figures 4.1 and 4.2. Figure 4.1: Number of Screened and EBLLs Children in Indiana from 1998 to 2002. 25 Figure 4.2: Percentage of Children Screened with EBLLs in Indiana from 1998 to 2002 in Indiana 4.2 Children BLLs in Selected Urban Areas Figures 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, and 4.10 illustrate the percentage of children with elevated blood lead levels (BLLs>=10 ug/dL) for census tracts in the selected urban areas of Indiana from 1998 to 2002. Figures 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17 and 4.18 illustrate the number of screened children and number of children with EBLLs in the selected urban areas from 1998 to 2002. These figures display the change of percentages of the EBLLs among the screened children from 1998 to 2002 for the eight selected urban areas. For most urban areas, the percentage of EBLLs reduced through time. However, in some urban areas the value increased. One of the reasons for the increase in EBLLs is because of a reduction in the number of children screened. For example, the 2002 increase in percentage of EBLLs in South Bend and Mishawaka is probably caused by the reduction of children screened. For the reason that the value of dependent variable would influence the procedure of selection of independent variables and parameters in the models, it is necessary to test if the reduction of the number of children screened would affect the results of generated models. Otherwise, these values need to be excluded before building the model. 26 Figure 4.3 Percentages of Children with EBLLs in Muncie from 1998 to 2002 by census tract 27 Figure 4.4 Percentages of Children with EBLLs in Evansville from 1998 to 2002 by census tract 28 Figure 4.5 Percentages of Children with EBLLs in Indianapolis from 1998 to 2002 by census tract 29 Figure 4.6 Percentages of Children with EBLLs in Elkhart and Goshen from 1998 to 2002 by census tract 30 Figure 4.7 Percentages of Children with EBLLs in South Bend and Mishawaka from 1998 to 2002 by census tract 31 Figure 4.8 Percentages of Children with EBLLs in Fort Wayne from 1998 to 2002 by census tract 32 Figure 4.9 Percentages of Children with EBLLs in Northern Lake County from 1998 to 2002 by census tract 33 Figure 4.10 Percentages of Children with EBLLs in Clarksville, New Albany, and Jeffersonville from 1998 to 2002 by census tract 34 Figure 4.11: Number of Screened and EBLLs Children in Muncie from 1998 to 2002 Figure 4.12: Number of Screened and EBLLs Children in Evansville from 1998 to 2002 Figure 4.13: Number of Screened and EBLLs Children in Indianapolis from 1998 to 2002 35 Figure 4.14: Number of Screened and EBLLs Children in Elkhart,Goshen from 1998 to 2002 Figure 4.15: Number of Screened and EBLLs Children in South Bend and Mishawaka from 1998 to 2002 Figure 4.16: Number of Screened and EBLLs Children in Fort Wayne from 1998 to 2002 36 Figure 4.17: Number of Screened and EBLLs Children in Northern Lake County from 1998 to 2002 Figure 4.18: Number of Screened and EBLLs Children in Clarksville, New Albany and Jeffersonville from 1998 to 2002 4.3 Independent Variables In the process of selecting of the independent variables, one needs to consider the normal distribution of the dependent variable and the three criteria for independent variable selection as indicated in Chapter Three. The dependent variable is calculated using the following equation: ∑ Number of EBLLs in Each Year Ln (Percentage of EBLLs +1) = Ln (———————————————————— * 100 + 1) ∑ Number of Screen Children in Each Year 37 SPSS was used to check the normal distribution of the residuals of the dependent variable. For independent variables, each of the twelve variables mentioned in Chapter Three was calculated by averaging the value through five years. A correlation table with all the independent and dependent variables was generated using SPSS. In addition, two methods were used to select the independent variables. One is the stepwise procedure. It is the procedure that at each step, the new independent variable is chosen by the one that is not in the equation and has the smallest probability of F is entered when the probability is sufficiently small. Variables already in the regression equation are removed if their probability of F becomes sufficiently large. The method terminates when no remaining variables are eligible for inclusion or removal. The other method is backward elimination. It is the procedure in which all variables are entered into the equation and then removed sequentially. The variable with the smallest partial correlation with the dependent variable is considered first for removal. If it meets the criterion for elimination, it is removed. After the first variable is removed, the variable remaining in the equation with the smallest partial correlation is considered next. The procedure stops when there are no variables in the equation that satisfy the removal criteria (SPSS User’s Guide, 2006). 4.4 Test the Model with Different Screened Data Among Figures 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, and 4.10, Figure 4.7 was the most obvious map that possessed an increase in the percentage of children with EBLLs from 1998 to 2002. A further inspection of the data found that the increase was probably due to a reduction in the number of children screened. Therefore, South Bend and Mishawaka urban area was chosen to test the model. Two models were 38 generated. One used all-years data using stepwise and backward elimination. The other excluded the year 2002. The results of the models are indicated below: Ln (Ratio of EBLLs Children + 1) = - 0.373 + 7.392 * (Ratio of Vacant Housing) + 0.661 * (Average Family Size) Ln (Ratio of EBLLs Children + 1) = - 0.432 + 8.189 * (Ratio of Vacant Housing) + 0.633 * (Average Family Size) According to the results, the elimination of 2002 would not require an introduction of new variables except changing the parameters slightly. Therefore, in the procedure of building the model, all five years would be included in the model building procedure. 4.5 Model of Indiana The selected census tract areas in Indiana are shown in Figure 4.19. Two models are generated by stepwise and backward elimination methods using SPSS, the results are the same: Ln(Ratio of EBLLs Children + 1) = -1.223 – 0.762 * (Ratio of Education with High school and Above) + 2.099 * (Average Family Size) - 0.997 * (Average Household Size) -1.171 * (Ratio of Housing Built before 1980) + 2.019 * (Ratio of Housing Built Before 1950) + 0.573 * (Ratio of Children under Eighteen Years Living below the Poverty Level) 39 Figure 4.19 Selected Census Tracts in Indiana The R2 value for this model is 0.558. Figure 4.20 shows the test of the homoscedasticity of the model. Figure 4.21 shows the residuals of model. Other results are listed in Appendix A. Figure 4.20 Test Homoscedasticity of State Model 40 Figure 4.21 Residuals by Census Tract Selected Areas in Indiana In Figure 4.21, area where more children were observed than the model predicts have positive residuals and are shown in blue color, while areas where fewer children observed than the model predicts have negative residuals and are shown in red color. Both of these areas are considered as census tracts with high residuals. 4.6 Models of Selected Urban Areas 4.6.1. Muncie The selected census tract areas in Muncie are shown in Figure 4.22. Two models are generated by stepwise and backward elimination methods using SPSS for Muncie as below: Model generated by stepwise: Ln (Ratio of EBLLs Children + 1) = 3.444 - 3.2999* (Ratio of Housing Built before 1980) 41 Model generated by backward elimination: Ln (Ratio of EBLLs Children + 1) = 0.848 + 6.297 * (Ratio of one Parent Family) + 7.304 * (Ratio of Vacant Housing) -3.474 * (Ratio of Children under Eighteen Years Living below the Poverty Level) Figure4.22 Selected Census Tracts in Muncie The second model is chosen for the reason that it has a higher R2 value, which is 0.644. Figure 4.23 shows the test of the homoscedasticity of the model. Figure 4.24 shows the residuals of model. Other results are listed in the Appendix A. Figure 4.23 Test Homoscedasticity of Muncie Model 42 Figure 4.24 Residuals by Census Tract Selected Areas in Muncie Using the same color pattern as state model residuals, areas where more children were observed than the model predicts have positive residuals and are shown in blue color, while areas where fewer children observed than the model predicts have negative residuals and are shown in red color. Both of these areas are considered as census tracts with high residuals. The following Figures 4.27, 4.30, 4.33, 4.36, 4.39, 4.42, and 4.45 use the same color pattern. Figure 4.24 shows that the census tracts with high value of residuals are located in the southern and western areas of Muncie, and two of them are in the center of Muncie. Most of these areas are rural, educational or commercial areas. After calculating the average ratio of children screened, the average in the census tracts with high value of residuals was 23.18, and the average in the rest of the census tracts was 25.31. Based on these observations, the Muncie model is overall more accurate in the census tracts with a high ratio of children screened. 43 4.6.2. Evansville The selected census tract areas in Evansville are shown in Figure 4.25. Two models are generated by stepwise and backward elimination methods using SPSS for Evansville as below: Model generated by stepwise: Ln (Ratio of EBLLs Children + 1) = 0.757+ 10.46 * (Ratio of Vacant Housing) Model generated by backward elimination: Ln (Ratio of EBLLs Children + 1) = -1.270 + 0.663 * (Average Household Size) + 9.408 * (Ratio of Vacant Housing) + 1.074 * (Ratio of Housing Built Before 1950) Figure4.25 Selected Census Tracts in Evansville The second model is chosen for the reason that it has a higher R2 value, which is 0.797. Figure 4.26 shows the test of the homoscedasticity of the model. 44 Figure 4.27 shows the residuals of model. Other results are listed in the Appendix A. Figure 4.26 Test Homoscedasticity of Evansville Model Figure 4.27 Residuals by Census Tract Selected Areas in Evansville Figure 4.27 shows the distribution of residuals by census tract in selected areas in Evansville. This figure used the same color pattern as Figure 4.24. It shows that 45 census tracts with high value of residuals are distributed in the northern and southern areas of Evansville. When referencing the remote sensing data of the local area, it was found that most of these places were rural, educational, or public areas. After calculating the average ratio of children screened, the average in the census tracts with high value of residuals was 19.97, and the average in the rest of census tracts was 18.47. 4.6.3. Indianapolis The selected census tract areas in Indianapolis are shown in Figure 4.28. Two models are generated by stepwise and backward elimination methods using SPSS show as below: Model generated by stepwise: Ln (Ratio of EBLLs Children + 1) = - 1.08 - 1.168 * (Ratio of Housing Built before 1980) + 1.879 * (Ratio of Children under Eighteen Years Living below the Poverty Level) + 1.586 * (Ratio of Housing Built before 1950) + 1.1 * (Average Family Size) – 2.778 * (Ratio of One Parent Family) Model generated by backward elimination: Ln (Ratio of EBLLs Children + 1) = 0.867 - 0.667 * (Average Household Size) + 1.286 * (Average Family Size) - 1.285 * (Ratio of High School or above Education) + 1.319 * (Ratio of Housing Built before 1950) - 1.215 * (Ratio of Housing Built before 1980) + 1.626 * (Ratio of Children under Eighteen Years Living below the Poverty Level) – 1.953 * (Ratio of Children under Five Years Living below the Poverty Level) 46 Figure 4.28 Selected Census Tracts in Indianapolis The second model is chosen for the reason that it has a higher value R2 value, which is 0.698. Because the distribution of the residuals of dependent variable of this model is not normal, this model is not statistically stable. Figure 4.29 shows the test of the homoscedasticity of the model. Figure 4.30 shows the residuals of model. Other results are listed in the Appendix A. Figure 4.29 Test Homoscedasticity of Indianapolis Model 47 Figure 4.30 Residuals by Census Tract Selected Areas in Indianapolis Figure 4.30 shows the distribution of residuals by census tract in selected areas in Indianapolis. It shows that the census tracts with high value of residuals are distributed in the northern, southern and western areas of Indianapolis. When referencing the remote sensing image of the local areas, it was found that most of these areas were suburban areas and some of the tracts were located in central of Indianapolis, where many commercial buildings were located. After calculating the average ratio of children screened, the average in the census tracts with high value of residuals was 11.87, and the average in the rest of the census tracts was 12.19. 4.6.4. Elkhart and Goshen The selected census tract areas in Elkhart and Goshen are shown in Figure 4.31. The model generated by stepwise methods using SPSS for Elkhart and Goshen is indicated below. No validated model is generated by backward elimination method using SPSS. 48 Ln (Ratio of EBLLs Children + 1) = 1.447 + 4.844 * (Ratio of Hispanic) Figure 4.31 Selected Census Tracts in Elkhart and Goshen The R2 value for this model is 0.651. Figure 4.32 shows the test of the homoscedasticity of the model. Figure 4.33 shows the residuals of the model. Other results are listed in the Appendix A. Figure 4.32 Test Homoscedasticity of Elkhart and Goshen Model 49 Figure 4.33 Residuals by Census Tract Selected Areas in Elkhart and Goshen Figure 4.33 shows the distribution of residuals by census tract in selected areas in Elkhart and Goshen. It shows that the distribution pattern for the census tracts with high value of residuals in this area is not clear. The residuals in these areas are low, all of which are lower than 0.50. After calculating the average ratio of children screened, the average in the census tracts with a high value of residuals was 6.56, and the average in the remaining census tracts was 13.24. It displayed that the residuals are low for the census tracts with a high ratio of children screened in Elkhart and Goshen. 50 4.6.5. South Bend and Mishawaka The selected census tract areas in South Bend and Mishawaka are shown in Figure 4.34. Two models are generated by stepwise and backward elimination methods using SPSS for South Bend and Mishawaka. The results are the same: Ln (Ratio of EBLLs Children + 1) = - 0.373 + 7.392 * (Ratio of Vacant Housing) + 0.661 * (Average Family Size) Figure 4.34 Selected Census Tracts in South Bend and Mishawaka The R2 value of the model is 0.689. Figure 4.35 shows the test of the homoscedasticity of the model. Figure 4.36 shows the residuals of model. Other results are listed in the Appendix A. 51 Figure 4.35 Test Homoscedasticity of South Bend and Mishawaka Model Figure 4.36 Residuals by Census Tract Selected Areas in South Bend and Mishawaka 52 Figure 4.36 shows the distribution of residuals by census tract in selected areas in South Bend and Mishawaka. The census tracts with a high value of residuals are in the southern areas, which are suburban according to the remote sensing image of the local area. After calculating the ratio of screened children, the average ratio in the census tracts with a high value of residuals was 11.15, and the average ratio in the rest of the census tracts was 12.74. 4.6.6. Fort Wayne The selected census tract areas in Fort Wayne are shown in Figure 4.37. Two models are generated by stepwise and backward elimination methods using SPSS in Fort Wayne are shown as below: Model generated by stepwise: Ln (Ratio of EBLLs Children + 1) = 5.269 - 3.981 * (Ratio of High School or above Education) + 3.259 * (Ratio of Housing Built before 1950) Model generated by backward elimination: Ln (Ratio of EBLLs Children + 1) = - 5.224 + 2.244 * (Average Family Size) +1.887 * (Ratio of Rental Housing) - 7.447 * (Ratio of One Parent Family) + 5.251 * (Ratio of Housing Built before 1950) - 5.694 * (Ratio of Children under Five Years Living below the Poverty Level) + 3.594 * (Ratio of Children under Eighteen Years Living below the Poverty Level) 53 Figure 4.37 Selected Census Tracts in Fort Wayne The second model is chosen for the reason that it has a higher R2 value, which is 0.825. Figure 4.38 shows the test of the homoscedasticity of the model. Figure 4.39 shows the residuals of model. Other results are listed in the Appendix A. Figure 4.38 Test Homoscedasticity of Fort Wayne Model 54 Figure 4.39 Residuals by Census Tract Selected Areas in Fort Wayne Figure 4.39 shows the distribution of residuals by census tract in selected areas of Fort Wayne. This figure used the same pattern as Figure 4.24. It shows that the census tracts with high value of residuals are in the northern, eastern and southern areas and there is no spatial pattern for this distribution. After calculating the ratio of children screened, the average in the census tracts with high value of residuals was 7.40, and the average in the rest of census tracts was 5.55. We could conclude that increasing the ratio of screened children might not help to reduce the residual in the Fort Wayne Model. 4.6.7. Northern Lake County The selected census tract areas in Northern Lake County are shown in Figure 4.40. Two models are generated by stepwise and backward elimination methods using SPSS in Northern Lake County as shown below: 55 Model generated by stepwise: Ln (Ratio of EBLLs Children + 1) = 1.255 + 7.484 * (Ratio of Vacant Housing) Model generated by backward elimination: Ln (Ratio of EBLLs Children + 1) = 0.218 + 1.361 * (Average Family Size) – 1.242 * (Ratio of Hispanic) + 0.696 * (Ratio of Black) – 3.453 * (Ratio of Housing Built before 1980) – 1.160 * (Ratio of Children under Eighteen Years Living below the Poverty Level) Figure 4.40 Selected Census Tracts in Northern Lake County The second model is chosen for the reason that it has a higher R2 value, which is 0.716. Figure 4.41 shows the test of the homoscedasticity of the model. Figure 4.42 shows the residuals of model. Other results are listed in the Appendix A. 56 Figure 4.41 Test Homoscedasticity of Northern Lake County Model Figure 4.42 Residuals by Census Tract Selected Areas in Northern Lake County Figure 4.42 shows the distribution of residuals by census tract in selected areas in Northern Lake County. It shows that the census tracts with a high value of residuals spread around the outside of the selected areas, where many factories and 57 schools are located. After calculating the ratio of children screened, the average in the census tracts with a high value of residuals is 4.79, and the average in the rest of census tracts is 4.85. Because the difference is too small, whether increasing the ratio of children screened would help to decrease the residual of Northern Lake County Model is not clear. 4.6.8. Clarksville, New Albany and Jeffersonville The selected census tract areas in Clarksville, New Albany and Jeffersonville are shown in Figure 4.43. Two models are generated by stepwise and backward elimination methods using SPSS for Clarksville, New Albany and Jeffersonville, showing as below: Model generated by stepwise: Ln (Ratio of EBLLs Children + 1) = 3.769 - 4.215 * (Ratio of High School or above Education) + 12. 216 * (Ratio of Vacant Housing) Model generated by backward elimination: Ln (Ratio of EBLLs Children + 1) = 1.101 +1.133 * (Average Household Size) + 15.975 * (Ratio of Vacant Housing) - 4.620 * (Ratio of High School or above Education) + 3.051 * (Ratio of Housing Built before 1950) + 8.966 * (Ratio of Children under Five Years Living below the Poverty Level) - 4.734 * (Ratio of Children under Eighteen Years Living below the Poverty Level) 58 Figure 4.43 Selected Census Tracts in Clarksville, New Albany and Jeffersonville The second model is chosen for the reason that it has a higher R2 value, which is 0.715. Figure 4.44 shows the test of the homoscedasticity of the model. Figure 4.45 shows the residuals of the model. Other results are listed in the Appendix A. Figure 4.44 Test Homoscedasticity of Clarksville, New Albany and Jeffersonville Model 59 Figure 4.45 Residuals by Census Tract Selected Areas in Clarksville, New Albany and Jeffersonville Figure 4.45 shows the distribution of residuals by census tract in selected areas in Clarksville, New Albany and Jeffersonville. It shows that the census tracts with a high value of residuals are mainly distributed outside of the areas of Clarksville, New Albany and Jeffersonville and two of them are located in the central area. Many of these areas are rural areas that cover large areas of forest or commercial areas. After calculating the ratio of children screened, the average in the census tracts with a high value of residuals is 11.15, and the average in the rest of the census tracts is 12.74. From these results, it could be concluded that increasing the ratio of children screened would help to decrease the average of residuals for Clarksville, New Albany and Jeffersonville model. 4.6.9. Comparison of Models Based on City Size Based on the population of urban areas, the selected urban areas in Indiana 60 are categorized into two classes. One is large urban areas with populations greater than 200,000. The other category is small urban areas with populations less than 200,000. The large urban areas include Northern Lake County, Fort Wayne and Indianapolis and the smaller urban areas include Evansville, Elkhart-Goshen, Muncie, Clarksville, New Albany and Jeffersonville, and South Bend and Mishawaka. The models of large urban areas are summarized in Table 4.1. In large urban areas, the children’s EBLLs are directly associated with average family size and the ratio of housing built before 1950, but indirectly associated with the ratio of housing built before 1980 and the ratio of children under five years living below the poverty level in two of the generated models of large urban areas. This means that these four variables have a high possibility to serve as a common variable in large urban areas. Whether the ratio of rental housing, the ratio of black, the ratio of Hispanic, the ratio of one parent family, and the ratio of high school or above education are related with children’s EBLLs is unclear because they only appear in one model. Average household size is directly associated with children’s EBLLs in Fort Wayne, but indirectly associated with children’s EBLLs in Indianapolis, and the ratio of children under eighteen years living below the poverty level is directly associated with children’s EBLLs in Fort Wayne and Indianapolis, but indirectly associated with children’s EBLLs in Northern Lake County. The models of small urban areas are summarized in Table 4.2. In small urban areas, the children’s EBLLs are directly associated with the ratio of vacant housing in four models. The children’s EBLLs are directly associated with average household size and the ratio of housing built before 1950 in the two models. The relationship between children’s EBLLs and average family size, the ratio of Hispanic, the ratio of children under five years living below the poverty level, the ratio of children under 61 eighteen years living below the poverty level, the ratio of housing built before 1980, the ratio of one parent family, and the ratio of high school or above education is unclear because they only appear in one model. 4.6.10. Comparison of Models Based on Location Based on the location of urban areas, the selected urban areas in Indiana are categorized into three regional categories, including South Bend and Mishawaka, Northern Lake County, Elkhart and Goshen, and Fort Wayne in Northern Indiana, Indianapolis and Muncie in Central Indiana, and Evansville and Clarksville, New Albany, and Jeffersonville in Southern Indiana. The models of the urban areas in Northern Indiana are summarized in Table 4.3. In Northern Indiana, the children’s EBLLs are directly associated with average family size as two models contain this variable. There are some contradictions within models for the variables of the ratio of Hispanic and the ratio of children under eighteen years living below the poverty level. For the variables of the ratio of vacant housing, the ratio of rental housing, average household size, the ratio of black, the ratio of children under five years living below the poverty level, the ratio of housing built before 1950, the ratio of housing built before 1980, and the ratio of one parent family, only one model contains these variables. The models for the urban areas in Central Indiana are summarized in Table 4.4. In Central Indiana, the children’s EBLLs are indirectly associated with the ratio of housing built before 1980. For the variables of the ratio of vacant housing, average household size, average family size, the ratio of children under eighteen years living below the poverty level, the ratio of housing built before 1950, the ratio of one parent family, the ratio of high school or above education, and the ratio of children under 62 eighteen years living below the poverty level, only one model contains these variables. The models of the urban areas in Southern Indiana are summarized in Table 4.5. In Southern Indiana, the children’s EBLLs are directly associated with the ratio of vacant housing, average household size, and the ratio of housing built before 1950 in both of the models. For the variables of the ratio of children under five years living below the poverty level, the ratio of children under eighteen years living below the poverty level, and the ratio of high school or above education, only one model contains these variables. 4.6.11. Comparison of Models Based on Accuracy In this research, a state-scale model was applied to the eight urban areas and the residuals were calculated. The error of this model was computed by using the absolute residual divided by the actual value. The errors of the urban model were calculated in the same procedure. All the results are shown in Table 4.6. Table 4.6 shows the difference between the urban model and state-scale model applied to the same area. Obviously, the urban models have a higher accuracy than the state-scale model in most of the areas. There are some exceptions in that the state-scale model is suitable for calculating Indianapolis and Elkhart and Goshen for the reason that the difference between the state-scale model and urban model is not large. However, using the state-scale model would cause large errors when using it in other areas such as Northern Lake County, Fort Wayne or Clarksville, or New Albany and Jeffersonville. Table 4.1 Comparison of Model Parameters for Large Urban Areas Ratio of High Ratio of Children under Ratio of Children under Ratio of Ratio of One Housing Housing School or Ratio of Rental Ave HH Ave Family Ratio of Five Years Living below Eighteen Years Living Name Hispanic Parent Family before 1950 before 1980 above the Poverty Level below the Poverty Level Housing Size Size Black 1.361 0.696 -1.16 -3.453 -1.242 Northern Lake County 1.887 2.244 -5.694 3.594 5.251 -7.447 Fort Wayne -0.667 1.286 -1.953 1.626 1.319 -1.215 -1.285 Indianapolis Table 4.2 Comparison of Model Parameters for Small Urban Areas Name Ratio of Children under Ratio of Children under Ratio of Vacant Ave HH Ave Family Ratio of Five Years Living below Eighteen Years Living Housing Size Size Hispanic the Poverty Level below the Poverty Level Housing before 1950 Housing Ratio of One Ratio of High before 1980 Parent Family School or above 4.844 7.304 Clarksville‐New Albany‐ Jeffersonville 15.975 South Bend‐Mishawaka Evansville 7.392 9.408 -3.474 1.133 8.966 -4.734 3.051 6.297 -4.62 0.661 0.663 1.074 63 Elkhart‐Goshen Muncie Table 4.3 Comparison of Model Parameters for Northern Indiana Name South Bend‐Mishawaka Northern Lake County Elkhart‐Goshen Fort Wayne Ratio of Vacant Ratio of Rental Ave HH Ave Family Housing Housing Size Size 7.392 0.661 1.361 1.887 Ratio of Hispanic Ratio of Black -1.242 4.844 0.696 2.244 Ratio of Children Ratio of Children under Housing Housing Ratio of One under Five Years Eighteen Years Living before 1950 before 1980 Parent Family Living below the below the Poverty Level Poverty Level -1.16 -5.694 3.594 -3.453 5.251 -7.447 Table 4.4 Comparison of Model Parameters for Central Indiana Name Indianapolis Muncie Ratio of Vacant Housing Ave HH Size Ave Family Size -0.667 1.286 Ratio of Children under Housing before Housing before Eighteen Years Living 1950 1980 below the Poverty Level 1.626 1.319 7.304 -1.215 -3.474 Ratio of One Parent Family Ratio of Children under Ratio of High School or Five Years Living below above the Poverty Level -1.285 -1.953 6.297 Table 4.5 Comparison of Model Parameters for Southern Indiana Name Clarksville‐New Albany‐ Jeffersonville Evansville Ratio of Vacant Housing Ave HH Size 15.975 1.133 9.408 0.663 Ratio of Children under Ratio of Children under Housing before Ratio of High Five Years Living below Eighteen Years Living 1950 School or above the Poverty Level below the Poverty Level 8.966 -4.734 3.051 -4.62 1.074 64 64 65 Table 4.6 Comparison of Models of different urban areas with the Model of Indiana Name Muncie Clarksville, New Albany and Jeffersonville South Bend and Mishawaka Indianapolis Fort Wayne Evansville Elkhart and Goshen Northern Lake County Urban Model Average Residual 0.3522 Urban Model Error(%) 27.704 State Model Average Residual 0.417 State Model Error(%) 33.073 0.315 27.729 0.528 52.371 0.241 0.309 0.28 0.255 0.264 0.267 11.802 21.568 15.904 23.198 14.191 15.113 0.38 0.316 0.521 0.321 0.334 0.425 17.637 23.139 23.808 36.214 15.475 22.052 5. SUMMARY AND DISCUSSION In this research, children’s EBLLs are analyzed based on lead screening data from Indiana. According to Figure 4.1 and Figure 4.2, the number and percentage of children’s EBLLs above 10ug/dL and above 20ug/dL decreased from 1998 to 2002. Figures 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, and 4.10 show the detail of the changes of the children’s EBLLs in selected urban areas. In order to test each model, an R squared value, a test of Homoscedasticity, and a residuals map are applied to each model. The R Squared values for all of the models range from 0.5 to 0.8 and most of them are distributed within 0.6 and 0.8. The charts of regression standardized residuals and regression standardized predicted point to the validity of the generated models. Through the residuals map in the eight selected urban areas, it is found that the distribution of the census tracts with high value of residuals are located in the outer periphery of most urban areas and include the rural, suburban, educational or commercial areas, where there were fewer residents are located. An exception for this distribution was in Elkhart and Goshen. It was also found that for some models, the residual was lower in the census tracts with a high ratio of children screened. Based on the entire state and different urban areas in Indiana, the children’s EBLLs varied. In each of the urban areas, not all of the twelve parameters were used 67 in the final models. Through the analysis of these models, the common parameters were found according to different sizes of urban areas and locations. Children’s EBLLs are correlated to socio-economic variables based on urban area size. In large urban areas, children’s EBLLs are related with average family size, the ratio of children under five years living below the poverty level, the ratio of housing built before 1950, and the ratio of housing built before 1980. In smaller urban areas, children’s EBLLs are related with the ratio of vacant housing, average household size, and the ratio of housing built before 1950. In both large and smaller urban areas, the models are related with the ratio of old housing stock. These results conform with other research (Haley and Talbot, 2004; Kim et al. 2008). Based on different locations, there is no common parameter for the models of northern, central and southern Indiana. In northern Indiana, the children’s EBLLs are associated with average family size, and in central Indiana, the children’s EBLLs are associated with the ratio of housing built before 1980. In southern Indiana, the children’s EBLLs are associated with the ratio of vacant housing, average household size, and the ratio of housing built before 1950. It was found that more common parameters existed in the southern Indiana model than in the northern and central Indiana models. On the other hand, some limitations are inherent in this research. For the reason that children’s EBLLs information comes from the Indiana State Department of Health, it cannot control the procedure of collecting samples of children screened and the accuracy of the sample data. In order to improve the quality of research data, two filters were used to exclude unsatisfactory data. It could increase the accuracy of the data, but causes a reduction of records when using the filters and the possibility of creating error still exists, which needs to be considered in future research. In addition, 68 by listing the figures and charts of children screened in each urban area from 1998 to 2002 in Chapter Four, it was found that the changes of the children’s EBLLs were not consistent. One of the reasons is that the sample was not consistent through the five-year-period. A test was made in Chapter Four to measure whether the inconsistent samples would change the models significantly for the South Bend and Mishawaka model. The results showed that this would only change the model slightly without introducing new parameters. However, it is not clear that the inconsistent samples could be ignored in other states or in different spatial resolutions such as census blocks or zip code areas. Another research limitation is that the socio-economic and housing information in the census tract areas was calculated using 2000 census data. It is assumed that this information would not change dramatically through the five-year-period. However, this information could have changed through time, and there might be some related errors related with that. In addition, the distribution of the residuals of the dependent variable was not norma in the Indianapolis model, which might affect the stablility of the generated model. Aside from the limitations of this research, some suggestions for future study of children EBLLs include the following. First, in order to compare different models in different urban areas, this research standardized a method to select independent parameters by stepwise and backward elimination using SPSS. These methods could be used to compare the city models to cities in other states.Second, both state and urban area models were generated in this research. None of the models had exactly the same parameters. This result shows the necessity to generate models for different locations. Third, it was found that some of the same parameters existed in the models of the same urban size or location in Indiana. The results show that geographic factors 69 could be potential elements in building a model for children’s EBLLs. How to incorporate geographic parameters into the model requires additional research. Fourth, applying the state model to each of the urban areas can help to test whether we could apply a state level model to each location. From Table 4.6, it can be concluded that the state level model has poorer accuracy when compared to the urban area models. However, to build a model for each urban area would require substantial much time and perhaps may not be achievable because of the lack of data. How to balance the weakness of both the urban area models and the state model would be a fruitful direction for further study. This research primarily explored different models of children’s BLLs in Indiana based on the socio-economic and housing parameters that are inherently geographic. Some researchers also suggest linking children’s BLLs with lead dust changes, weather, soil moisture, as listed in Chapter Two. According to the data available, this research could not achieve that goal. In addition, using small region resolutions such as individual data to build the models would be more accurate and could help to better understand the parameters in the models. 6. REFERENCES Canfield RL, Henderson CR, Cory-Slechta Da, Cox C, Jusko Ta, Lanpher BP. (2003). Intellectual Impairment in children with blood lead concentration below 10 ug per deciliter. N Egn J Med 348:1517-1526 CDC. (1997). Screening Young Children for Lead Poisoning: Guidance for State and Local Public Health Officials. Atlanta. GA: Center for Disease Control and prevention. Mushak P. (1992). Defining lead as the premiere environmental health issue for children in America: criteria and their quantitative application. Environment Research 59:281-309 Tonny J. Oyana and Florence M. Margai.(2007).Geographic Analysis of Health Risks of Pediatric Lead Exposure: A Golden Opportunity to Promote Health Neighborhoodes. Archieves of Environment & Occupational Health 62(2):93-104 Kristen Rappazzo, Curtis E. Cummings, Tobert M. Himmelsbach, and Richard Tobin. (2007). The Effect of Housing Compilance Status on Children’s Blood Lead Levels. Archieves of Environment & Occupational Health 62(2):81-85 Landrigan PJ, Schechter CB, Lipton JM, Fahs MC, Schwatz J. (2002). Environmental pollutants and disease in American children: estimates of morbidity, mortality and costs for lead poisoning, asthma, cancer, and developmental disabilities. Environment Health Perspect 110:721-728 Dohyeong Kim, M. Alicia Overstreet Galeano, Andrew Hull, and Marie Lynn Miranda. (2008). A Framework for Widespread Replication of a Highly Spatially Resolved Childhood Lead Exposure Risk Model. Environmental Health Perspectives 116(12):1735-1739 Ellen K. Cromley and Sara L. McLafferty.(2002). GIS and Public Health(The Guilford Press). New York: A Division of Guilford Publication, Inc. Indiana State Deparment of Health. (2004). Indiana’s Childhood Lead Poising Elimination Plan. Lisel A O’Dwyer.(1998). The Use of GIS in Identify Risk of Elevated Blood Lead Levels in Australia. GIS in Public Health 3rd National Conference. CA: San Diego 71 Valeri B Heley and Tomas O. Talbot. (2004). Geographic Analysis of Blood Lead Level in New York State Children Born 1994-1997. Environmental Health Perspective 112:1577-1582 Tomas O Talbot, Steven P Forand, Valerie B Haley. (1998).Geographic Analysis of Childhood Exposure in New York State. GIS in Public Health 3rd National Conference. CA: San Diego Kay Bruening, Francies W. Kenp, Nicole Simone, Yvette Holding, Donald Bl. Louria, and John D. Bogden.(1999). Dietary Calcium Intakes of Urban Children at Risk of Lead Poisoning. Environmental Health Perspectives 107(6):431-435 Mark R. Farfel, Anna O. Orlova, Peter S. J. Lees, Charles Rohde, Peter J. Ashley, and Julian Chisolm, Jr.(2003). A Study of Urban Housing Demolitions as Sources of Lead of Lead in Ambient Dust: Demolition Practices and Exterior Dust Fall. Environment Health Perspectives 111(9):1228-1234 Chassan A. Khoury and Gary L. Diamond.(2003). Risks to children from exposure to lead in air during remedial or removal activities at Superfund Site: A case study of the RSR lead smelter Superfund site. Journal of Exposure Analysis and Environmental Epidemiology 13(1):51-65 Mark A.S. Laidlaw, Howard W. Mielke, Gabriel M. Filippelli, David L. Johnson, and Chirstopher R. Gonzales. (2005). Seasonality and Children’s Blood Lead Level: Developeing a Predictive Model Using Climatic Variable and Blood Lead Data from Indianapolis, Indiana, Syracuse, New York, and New Orleans, Louisiana (USA).Environmental Health Perspectives.113(6):793-800 Kristen Rappazzo, Curtis E. Cummings, Robert m. Himmelsbach, and Richard Tobin. (2007). The Effect of Housing Compliance Status on Children’s Blood Lead Level. Archives of Environmental & Occupational Health. 62(2): 81-85 Marie Lynn Miranda, Dana C. Dolinoy, and Alicia Overstreet. (2002). Mapping for Prevention: GIS Models for directing Childhood Lead Poisoning Prevention Programs. Environmental health Perspective 110(9): 947-953 James R Roberts, Thomas C. Hulsey, Gerald B. Curtis, and J. Routt Reigart. (2003). Using Geographic Information Systems to Assess Risk for Elevated Blood Lead Levels in Children. Public Health Reports 118:221-228 Florence Lansana Margai. (1998). Geographic Information Analysis of Pediatric Lead Poisioning. GIS in Public Health 3rd National Conference. CA: San Diego J. Chapman McGrew, Jr and Charles B. Bonroe (2000). An Introduction to Statistical Problem Solving in Geography. Boston, Madison, New York: McGraw Hill Press Indiana Geological Survey. 2008. http: //129.79.145.7/arcims/statewide_mxd /index. html APPENDIX A 7.1 State Level Model: Table 7.1: Test the Normal Distribution of residuals of dependent variable in State level Model Tests of Normality a Kolmogorov-Smirnov Shapiro-Wilk Statistic df Sig. Statistic df Unstandardized Residua .032 465 .200* .995 465 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.1: Histogram of residual of dependent variable in Indiana State Sig. .125 Table 7.2: Result of using the stepwise for selected tracts in Indiana Model Summaryg Change Statistics Model 1 2 3 4 5 6 R R Square .515a .265 b .656 .430 .692c .479 d .733 .538 .743e .552 .747f .558 Adjusted R Square .263 .427 .476 .534 .547 .552 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** R Square Change .265 .165 .049 .059 .015 .005 F Change 166.943 133.496 43.498 58.310 15.151 5.504 df1 1 1 1 1 1 1 df2 463 462 461 460 459 458 Sig. F Change .000 .000 .000 .000 .000 .019 DurbinWatson 1.637 a. Predictors: (Constant), AVE_FAM_SZ b. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ c. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980 d. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980, Ratio_1950 e. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980, Ratio_1950, Poverty_Ra f. Predictors: (Constant), AVE_FAM_SZ, AVE_HH_SZ, Ratio_1980, Ratio_1950, Poverty_Ra, Ratio_H_Ed g. Dependent Variable: Ln(Ratio_Pb_10+1) 73 Table 7.3: Result of using the backward elimination method for selected tracts in Indiana Model Summaryh Change Statistics Model 1 2 3 4 5 6 7 R R Square .749a .561 b .749 .561 .749c .561 d .749 .561 .749e .560 .748f .559 g .747 .558 Adjusted R Square .549 .550 .551 .552 .553 .553 .552 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** ********** R Square Change .561 .000 .000 .000 .000 -.001 -.002 F Change 48.091 .016 .024 .152 .261 .933 1.848 df1 12 1 1 1 1 1 1 df2 452 452 453 454 455 456 457 Sig. F Change .000 .898 .876 .697 .610 .335 .175 DurbinWatson 1.637 a. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Hisp, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent, Poverty_Wi, Ratio_One_, AVE_FAM_SZ b. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Hisp, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent, Ratio_One_, AVE_FAM_SZ c. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Hisp, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ d. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_Blac, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ e. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ f. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ g. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_H_Ed, AVE_FAM_SZ h. Dependent Variable: Ln(Ratio_Pb_10+1) 74 Table 7.4: Coefficient of Indiana Model Coefficientsa Model 1 (Constant) AVE_FAM_SZ AVE_HH_SZ Ratio_1980 Ratio_1950 Poverty_Ra Ratio_H_Ed Unstandardized Coefficients B Std. Error -1.223 .537 2.099 .216 -.997 .149 -1.171 .145 2.019 .321 .573 .227 -.762 .325 Standardized Coefficients Beta .639 -.386 -.275 .215 .110 -.099 t -2.277 9.719 -6.672 -8.100 6.292 2.530 -2.346 Sig. .023 .000 .000 .000 .000 .012 .019 Zero-order .515 .154 -.387 .362 .490 -.509 Correlations Partial .414 -.298 -.354 .282 .117 -.109 Part .302 -.207 -.252 .196 .079 -.073 Collinearity Statistics Tolerance VIF .224 .288 .836 .828 .511 .546 4.471 3.467 1.196 1.208 1.956 1.833 a. Dependent Variable: Ln(Ratio_Pb_10+1) 75 76 Figure 7.2: Distribution of dependent variable to each independent variable in Indiana. 77 7.2. Muncie Model: Table 7.5: Test the Normal Distribution of residual of dependent variable in Muncie Model Tests of Normality a Unstandardized Residual Kolmogorov-Smirnov Statistic df Sig. .106 21 .200* Statistic .967 Shapiro-Wilk df 21 Sig. .670 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.3: Histogram of dependent variable in Muncie Table 7.6: Result of using the stepwise method for selected tracts in Muncie b Model Summary Model 1 R R Square .783a .613 Change Statistics Adjusted Std. Error of R Square df1 df2 Sig. F Change R Square the Estimate Change F Change .593 ********** .613 30.105 1 19 .000 DurbinWatson 1.672 a. Predictors: (Constant), Ratio_1980 b. Dependent Variable: Ln(Ratio_Pb_10+1) Table 7.7: Coefficients of Muncie Model using stepwise method a Coefficients Model 1 (Constant) Ratio_1980 Unstandardized Standardized Coefficients Coefficients B Std. Error Beta 3.444 .375 -3.299 .601 -.783 a. Dependent Variable: Ln(Ratio_Pb_10+1) t 9.179 -5.487 Correlations Sig. Zero-order Partial .000 .000 -.783 -.783 Part -.783 Collinearity Statistics Tolerance VIF 1.000 1.000 Table 7.8: Result of using the backward elimination method for selected tracts in Muncie Model Summary k Change Statistics Model 1 2 3 4 5 6 7 8 9 10 R .876a .876b .876c .876d .871e .868f .861g .848h .820i .803j R Square .768 .768 .768 .767 .759 .754 .741 .719 .673 .644 Adjusted R Square .419 .484 .535 .576 .599 .622 .630 .626 .591 .582 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** ********** ********** ********** ********** R Square Change .768 .000 .000 -.001 -.007 -.005 -.013 -.022 -.047 -.028 F Change 2.202 .001 .003 .040 .350 .255 .698 1.169 2.492 1.370 df1 df2 12 1 1 1 1 1 1 1 1 1 8 8 9 10 11 12 13 14 15 16 Sig. F Change .134 .979 .961 .846 .566 .622 .419 .298 .135 .259 DurbinWatson 1.487 a. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, AVE_HH_SZ, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_H_Ed, Poverty_Wi, Ratio_One_, Ratio_Vaca b. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_H_Ed, Poverty_Wi, Ratio_One_, Ratio_Vaca c. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Poverty_Wi, Ratio_One_, Ratio_Vaca d. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_One_, Ratio_Vaca e. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_One_, Ratio_Vaca f. Predictors: (Constant), Poverty_Ra, Ratio_Blac, Ratio_Rent, Ratio_Hisp, AVE_FAM_SZ, Ratio_One_, Ratio_Vaca g. Predictors: (Constant), Poverty_Ra, Ratio_Blac, Ratio_Rent, Ratio_Hisp, Ratio_One_, Ratio_Vaca h. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_Hisp, Ratio_One_, Ratio_Vaca i. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_One_, Ratio_Vaca j. Predictors: (Constant), Poverty_Ra, Ratio_One_, Ratio_Vaca k. Dependent Variable: Ln(Ratio_Pb_10+1) 78 Table 7.9: Coefficients of Muncie Model using the backward elimination method a Coefficients Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta 1 (Constant) .848 .262 Ratio_Vaca 7.304 1.561 .791 Poverty_Ra -3.474 .953 -.702 Ratio_One_ 6.297 2.397 .462 t 3.236 4.679 -3.647 2.627 Sig. .005 .000 .002 .018 Correlations Zero-order Partial .588 -.034 .336 .750 -.663 .537 Part .677 -.527 .380 Collinearity Statistics Tolerance VIF .731 .564 .677 1.367 1.773 1.477 a. Dependent Variable: Ln(Ratio_Pb_10+1) 79 80 Figure 7.4: Distribution of dependent variable according to each independent variable in Muncie 81 7.3 Evansville Model: Table 7.10: Test the Normal Distribution of residual of dependent variable in Evansville Model Tests of Normality a Kolmogorov-Smirnov Statistic df Sig. Unstandardized Residua .110 36 .200* Statistic .977 Shapiro-Wilk df 36 Sig. .641 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.5: Histogram of residual of dependent variable in Evansville. Table 7.11: Result of using the stepwise for selected tracts in Evansville Model Summaryb Change Statistics Model 1 R R Square .870a .757 Adjusted R Square .750 Std. Error of the Estimate .375 R Square Change .757 F Change 105.966 df1 df2 1 34 Sig. F Change .000 DurbinWatson 2.474 a. Predictors: (Constant), Ratio_Vaca b. Dependent Variable: Ln(Ratio_pb_10+1) Table 7.12: Coefficients of Evansville Model using stepwise method Coefficientsa Model 1 (Constant) Ratio_Vaca Unstandardized Coefficients B Std. Error .757 .114 10.460 1.016 Standardized Coefficients Beta a. Dependent Variable: Ln(Ratio_pb_10+1) .870 t 6.658 10.294 Sig. Zero-order .000 .000 .870 Correlations Partial .870 Part .870 Collinearity Statistics Tolerance VIF 1.000 1.000 Table 7.13: Result of using the backward elimination method for selected tracts in Evansville Model Summaryk Change Statistics Model 1 2 3 4 5 6 7 8 9 10 R .907a .907b .907c .907d .907e .907f .905g .902h .897i .893j R Square .823 .823 .823 .822 .822 .822 .818 .813 .804 .797 Adjusted R Square .731 .742 .752 .761 .770 .778 .781 .782 .779 .778 Std. Error of the Estimate .390 .382 .374 .367 .360 .354 .351 .351 .353 .353 R Square Change .823 .000 .000 .000 .000 .000 -.004 -.005 -.009 -.007 F Change 8.907 .009 .034 .015 .016 .045 .570 .876 1.436 1.049 df1 df2 12 1 1 1 1 1 1 1 1 1 23 23 24 25 26 27 28 29 30 31 Sig. F Change .000 .924 .855 .903 .901 .834 .456 .357 .240 .314 DurbinWatson 2.425 a. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_One_, Poverty_Wi, Ratio_Vaca, AVE_FAM_SZ, Ratio_Rent b. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_One_, Ratio_Vaca, AVE_FAM_SZ, Ratio_Rent c. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_One_, Ratio_Vaca, AVE_FAM_SZ d. Predictors: (Constant), Poverty_Ra, Ratio_1950, AVE_HH_SZ, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_Vaca, AVE_FAM_SZ e. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Blac, Ratio_Vaca, AVE_FAM_SZ f. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Hisp, Ratio_H_Ed, Ratio_Vaca, AVE_FAM_SZ g. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Hisp, Ratio_Vaca, AVE_FAM_SZ h. Predictors: (Constant), Poverty_Ra, Ratio_1950, Ratio_1980, Ratio_Vaca, AVE_FAM_SZ i. Predictors: (Constant), Ratio_1950, Ratio_1980, Ratio_Vaca, AVE_FAM_SZ j. Predictors: (Constant), Ratio_1950, Ratio_Vaca, AVE_FAM_SZ k. Dependent Variable: Ln(Ratio_pb_10+1) 82 Table 7.14: Coefficients of Evansville Model using backward elimination method a Coefficients Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta 1 (Constant) -1.270 .981 Ratio_1950 1.074 .581 .148 Ratio_Vaca 9.408 1.101 .783 AVE_FAM_S .663 .346 .176 t -1.295 1.849 8.541 1.917 Sig. .204 .074 .000 .064 Correlations Zero-order Partial .134 .870 .548 .311 .834 .321 Part .147 .680 .153 Collinearity Statistics Tolerance VIF .987 .754 .747 1.013 1.326 1.338 a. Dependent Variable: Ln(Ratio_pb_10+1) 83 84 Figure7.6: Distribution of dependent variable according to each independent variable in Evansville 85 7.4 Indianapolis Model: Table 7.15: Test the Normal Distribution of residuals of dependent variable in Indianapolis Model Tests of Normality a Kolmogorov-Smirnov Statistic df Sig. Unstandardized Residual .086 125 .024 Shapiro-Wilk Statistic df .986 125 Sig. .230 a. Lilliefors Significance Correction Figure 7.7: Histogram of residual of dependent variable in Indianapolis. Table 7.16: Result of using the stepwise for selected tracts in Indianapolis Model Summary f Change Statistics Model 1 2 3 4 5 R .616a .759b .787c .810d .823e R Square .379 .576 .619 .655 .677 Adjusted R Square .374 .569 .610 .644 .663 Std. Error of the Estimate ********** ********** ********** ********** ********** R Square Change .379 .196 .044 .036 .021 F Change 75.139 56.426 13.901 12.606 7.787 df1 df2 1 1 1 1 1 123 122 121 120 119 DurbinWatson Sig. F Change .000 .000 .000 .001 .006 2.042 a. Predictors: (Constant), Ratio_1980 b. Predictors: (Constant), Ratio_1980, Poverty_Ra c. Predictors: (Constant), Ratio_1980, Poverty_Ra, Ratio_1950 d. Predictors: (Constant), Ratio_1980, Poverty_Ra, Ratio_1950, AVE_FAM_SZ e. Predictors: (Constant), Ratio_1980, Poverty_Ra, Ratio_1950, AVE_FAM_SZ, Ratio_One_ f. Dependent Variable: Ln(Ratio_Pb_10+1) Table 7.17: Coefficients of Indianapolis Model using stepwise method Coefficients a Model 1 (Constant) Ratio_1980 Poverty_Ra Ratio_1950 AVE_FAM_SZ Ratio_One_ Unstandardized Coefficients B Std. Error -1.080 .710 -1.168 .214 1.879 .338 1.586 .437 1.100 .239 -2.778 .995 Standardized Coefficients Beta -.358 .398 .199 .348 -.234 t -1.521 -5.466 5.565 3.630 4.593 -2.790 Sig. .131 .000 .000 .000 .000 .006 Zero-order -.616 .603 .337 .511 .121 Correlations Partial -.448 .454 .316 .388 -.248 Part -.285 .290 .189 .239 -.145 Collinearity Statistics Tolerance VIF .634 .532 .908 .473 .388 1.577 1.879 1.102 2.112 2.577 a. Dependent Variable: Ln(Ratio_Pb_10+1) 86 Table 7.18: Result of using the backward elimination method for selected tracts in Indianapolis Model Summaryg Change Statistics Model 1 2 3 4 5 6 R R Square .842a .709 b .842 .709 .842c .709 .841d .708 e .839 .703 .835f .698 Adjusted R Square .678 .681 .683 .685 .683 .679 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** R Square Change .709 .000 .000 -.001 -.004 -.006 F Change 22.739 .035 .152 .344 1.739 2.207 df1 12 1 1 1 1 1 df2 112 112 113 114 115 116 Sig. F Change .000 .852 .697 .559 .190 .140 DurbinWatson 2.056 a. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_Rent, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_FAM_SZ b. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_Rent, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_FAM_SZ c. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_FAM_SZ d. Predictors: (Constant), Poverty_Ra, Ratio_Hisp, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_FAM_SZ e. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_FAM_SZ f. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1950, Ratio_1980, Ratio_H_Ed, Poverty_Wi, AVE_FAM_SZ g. Dependent Variable: Ln(Ratio_Pb_10+1) 87 Table 7.19: Coefficients of Indianapolis Model using backward elimination method Coefficientsa Model 1 (Constant) Poverty_Ra AVE_HH_SZ Ratio_1950 Ratio_1980 Ratio_H_Ed Poverty_Wi AVE_FAM_SZ Unstandardized Coefficients B Std. Error .867 .890 1.626 .513 -.677 .255 1.319 .440 -1.215 .203 -1.285 .445 -1.953 .968 1.286 .379 Standardized Coefficients Beta .344 -.289 .165 -.372 -.207 -.183 .407 t .975 3.170 -2.656 3.000 -5.975 -2.886 -2.018 3.392 Sig. .332 .002 .009 .003 .000 .005 .046 .001 Correlations Zero-order Partial .603 .165 .337 -.616 -.573 .401 .511 .281 -.238 .267 -.484 -.258 -.183 .299 Part .161 -.135 .153 -.304 -.147 -.103 .172 Collinearity Statistics Tolerance VIF .219 .219 .852 .667 .502 .313 .180 4.563 4.576 1.174 1.500 1.991 3.193 5.568 a. Dependent Variable: Ln(Ratio_Pb_10+1) 88 89 Figure 7.8: Distribution of dependent variable according to each independent variable in Indianapolis 90 7.5 Elkhart and Goshen Model: Table 7.20: Test the Normal Distribution of residual of dependent variable in Elkhart and Goshen Model Tests of Normality a Unstandardized Residual Kolmogorov-Smirnov Statistic df Sig. .140 11 .200* Statistic .949 Shapiro-Wilk df 11 Sig. .632 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.9: Histogram of residual of dependent variable in Elkhart and Goshen. Table 7.21: Result of using the stepwise for selected tracts in Elkhart and Goshen b Model Summary Model 1 R R Square .807a .651 Adjusted R Square .612 Change Statistics Std. Error of R Square the Estimate Change F Change df1 df2 Sig. F Change ********** .651 16.801 1 9 .003 DurbinWatson 1.616 a. Predictors: (Constant), Ratio_Hisp b. Dependent Variable: Ln(Ratio_Pb_10+1) Table 7.22: Coefficients of Elkhart and Goshen Model using stepwise method Coefficientsa Model 1 (Constant) Ratio_Hisp Unstandardized Coefficients B Std. Error 1.447 .235 4.844 1.182 a. Dependent Variable: Ln(Ratio_Pb_10+1) Standardized Coefficients Beta .807 t 6.155 4.099 Sig. .000 .003 Zero-order .807 Correlations Partial .807 Part .807 Collinearity Statistics Tolerance VIF 1.000 1.000 91 7.6 South Bend and Mishawaka Model: Table 7.23: Test the Normal Distribution of residual of dependent variable in South Bend and Mishawaka Model Tests of Normality a Unstandardized Residual Kolmogorov-Smirnov Statistic df Sig. .111 33 .200* Statistic .972 Shapiro-Wilk df 33 Sig. .527 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.10: Histogram of residual of dependent variable in South Bend and Mishawaka. Table 7.24: Result of using the stepwise for selected tracts in South Bend and Mishawaka Model Summaryc Change Statistics Model 1 2 R R Square .782a .612 .830b .689 Adjusted R Square .599 .668 Std. Error of R Square the Estimate Change ********** .612 ********** .077 a. Predictors: (Constant), Ratio_Vaca b. Predictors: (Constant), Ratio_Vaca, AVE_FAM_SZ c. Dependent Variable: Ln(Ratio_Pb_10+1) F Change 48.803 7.447 df1 df2 1 1 Sig. F Change 31 .000 30 .011 DurbinWatson 2.411 Table 7.25: Result of using the backward elimination method for selected tracts in South Bend and Mishawaka Model Summary l Change Statistics Model 1 2 3 4 5 6 7 8 9 10 11 R .864a .864b .864c .864d .864e .864f .863g .861h .856i .845j .830k R Square .747 .747 .747 .747 .747 .746 .745 .741 .732 .713 .689 Adjusted R Square .595 .615 .632 .648 .662 .675 .686 .693 .694 .684 .668 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** ********** ********** ********** ********** ********** R Square Change .747 .000 .000 .000 .000 .000 -.001 -.004 -.008 -.019 -.024 F Change 4.924 .001 .005 .014 .028 .046 .137 .405 .884 1.988 2.473 df1 df2 12 1 1 1 1 1 1 1 1 1 1 20 20 21 22 23 24 25 26 27 28 29 Sig. F Change .001 .977 .945 .907 .869 .832 .715 .530 .355 .170 .127 DurbinWatson 2.411 a. Predictors: (Constant), Ratio_One_, Ratio_1980, Ratio_Rent, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Blac, Poverty_Ra, AVE_HH_SZ, AVE_FAM_SZ b. Predictors: (Constant), Ratio_One_, Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Blac, Poverty_Ra, AVE_ HH_SZ, AVE_FAM_SZ c. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Ratio_Blac, Poverty_Ra, AVE_HH_SZ, AVE_FAM_SZ d. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Poverty_Ra, AVE_HH_SZ, AVE_FAM_SZ e. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, Poverty_Ra, AVE_HH_SZ, AVE_FAM_SZ f. Predictors: (Constant), Ratio_1980, Ratio_Hisp, Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ g. Predictors: (Constant), Ratio_1980, Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ h. Predictors: (Constant), Poverty_Wi, Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ i. Predictors: (Constant), Ratio_Vaca, Ratio_H_Ed, AVE_HH_SZ, AVE_FAM_SZ j. Predictors: (Constant), Ratio_Vaca, AVE_HH_SZ, AVE_FAM_SZ k. Predictors: (Constant), Ratio_Vaca, AVE_FAM_SZ l. Dependent Variable: Ln(Ratio_Pb_10+1) 92 Table 7.26: Coefficients of South Bend and Mishawaka using backward elimination method Coefficientsa Model 1 (Constant) AVE_FAM_SZ Ratio_Vaca Unstandardized Standardized Coefficients Coefficients B Std. Error Beta -.373 .682 .661 .242 .368 7.392 1.841 .541 t -.547 2.729 4.016 Sig. Zero-order .589 .011 .722 .000 .782 Correlations Partial .446 .591 Part .278 .409 Collinearity Statistics Tolerance VIF .571 .571 1.751 1.751 a. Dependent Variable: Ln(Ratio_Pb_10+1) Figure7.11: Distribution of dependent variable according to each independent variable in South Bend and Mishawaka 93 94 6.7 Fort Wayne Model: Table 7.27: Test the Normal Distribution of residual of dependent variable in Fort Wayne Tests of Normality a Unstandardized Residual Kolmogorov-Smirnov Statistic df Sig. .117 33 .200* Statistic .958 Shapiro-Wilk df 33 Sig. .230 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.12: Histogram of residual of dependent variable in Fort Wayne Table 7.28: Result of using the stepwise for selected tracts in Fort Wayne Model Summaryc Change Statistics Model 1 2 R R Square .685a .469 .754b .568 Adjusted R Square .451 .540 Std. Error of the Estimate ********** ********** R Square Change .469 .100 F Change 27.338 6.927 df1 df2 1 1 31 30 Sig. F Change .000 .013 DurbinWatson 1.566 a. Predictors: (Constant), Ratio_H_Ed b. Predictors: (Constant), Ratio_H_Ed, Ratio_1950 c. Dependent Variable: Ln(Ratio_Pb_10+1) Table 7.29: Coefficients of Fort Wayne using backward elimination method a Coefficients Model 1 (Constant) Ratio_H_Ed 2 (Constant) Ratio_H_Ed Ratio_1950 Unstandardized Standardized Coefficients Coefficients B Std. Error Beta 6.645 .808 -5.126 .980 -.685 5.269 .907 -3.981 .998 -.532 3.259 1.238 .351 a. Dependent Variable: Ln(Ratio_Pb_10+1) t 8.221 -5.229 5.813 -3.988 2.632 Correlations Sig. Zero-order Partial .000 .000 -.685 -.685 .000 .000 -.685 -.589 .013 .583 .433 Part Collinearity Statistics Tolerance VIF -.685 1.000 1.000 -.478 .316 .810 .810 1.235 1.235 Table 7.30: Result of using the backward elimination method for selected tracts in Fort Wayne Model Summaryh Change Statistics Model 1 2 3 4 5 6 7 R .920a .920b .920c .920d .917e .912f .908g R Square .847 .847 .846 .846 .841 .831 .825 Adjusted R Square .755 .766 .777 .785 .788 .784 .785 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** ********** R Square Change .847 .000 .000 -.001 -.004 -.010 -.006 F Change 9.202 .008 .013 .121 .651 1.512 .929 df1 df2 12 1 1 1 1 1 1 20 20 21 22 23 24 25 Sig. F Change .000 .930 .910 .731 .428 .231 .344 DurbinWatson 2.445 a. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Ratio_H_Ed, Ratio_Vaca, Poverty_Wi, Ratio_One_, AVE_FAM_SZ, AVE_HH_SZ b. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Ratio_H_Ed, Poverty_Wi, Ratio_One_, AVE_ FAM_SZ, AVE_HH_SZ c. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ, AVE_HH_SZ d. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Ratio_Hisp, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ, AVE_HH_SZ e. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ, AVE_HH_SZ f. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_One_, AVE_FAM_SZ g. Predictors: (Constant), Poverty_Ra, Ratio_Rent, Ratio_1950, Poverty_Wi, Ratio_One_, AVE_FAM_SZ h. Dependent Variable: Ln(Ratio_Pb_10+1) 95 Table 7.31: Coefficients of Fort Wayne Model using backward elimination method Coefficientsa Model 1 (Constant) Poverty_Ra Ratio_Rent Ratio_1950 Poverty_Wi Ratio_One_ AVE_FAM_SZ Unstandardized Coefficients B Std. Error -5.224 1.719 3.594 1.171 1.887 .502 5.251 .977 -5.694 1.928 -7.447 1.851 2.244 .567 Standardized Coefficients Beta .855 .449 .565 -.526 -.800 .740 t -3.040 3.070 3.756 5.375 -2.953 -4.024 3.962 Sig. .005 .005 .001 .000 .007 .000 .001 Zero-order Correlations Partial .582 .178 .583 .429 .394 .621 .516 .593 .725 -.501 -.619 .614 Part .252 .308 .441 -.242 -.330 .325 Collinearity Statistics Tolerance VIF .087 .470 .609 .212 .170 .193 11.523 2.125 1.643 4.717 5.867 5.177 a. Dependent Variable: Ln(Ratio_Pb_10+1) 96 97 Figure7.13: Distribution of dependent variable according to each independent variable in Fort Wayne 98 7.8 Northern Lake County Model: Table 7.32: Test the Normal Distribution of residual of dependent variable in Northern Lake County Model Tests of Normality a Unstandardized Residual Kolmogorov-Smirnov Statistic df Sig. .133 25 .200* Statistic .955 Shapiro-Wilk df 25 Sig. .317 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.14: Histogram of residual of dependent variable in Northern Lake County Table 7.33: Result of using the stepwise for selected tracts in Northern Lake County b Model Summary Model 1 R R Square .672a .452 Change Statistics Adjusted Std. Error of R Square R Square the Estimate Change F Change df1 df2 Sig. F Change .428 ********** .452 18.987 1 23 .000 DurbinWatson 1.641 a. Predictors: (Constant), Ratio_Vaca b. Dependent Variable: Ln(Ratio_Pb_10+1) Table 7.34: Coefficients of Northern Lake County Model using stepwise method Coefficientsa Model 1 (Constant) Ratio_Vaca Unstandardized Coefficients B Std. Error 1.255 .241 7.484 1.718 a. Dependent Variable: Ln(Ratio_Pb_10+1) Standardized Coefficients Beta .672 t 5.206 4.357 Sig. .000 .000 Zero-order .672 Correlations Partial .672 Part .672 Collinearity Statistics Tolerance VIF 1.000 1.000 Table 7.35: Result of using the backward elimination method for selected tracts in Northern Lake County Model Summaryi Change Statistics Model 1 2 3 4 5 6 7 8 R R Square .868a .754 b .868 .754 c .868 .753 .867d .751 e .865 .749 .863f .744 g .853 .727 .846h .716 Adjusted R Square .507 .545 .576 .602 .623 .639 .636 .641 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** ********** ********** R Square Change .754 .000 -.001 -.002 -.002 -.005 -.017 -.011 F Change 3.060 .005 .049 .091 .142 .297 1.131 .726 df1 df2 12 1 1 1 1 1 1 1 12 12 13 14 15 16 17 18 Sig. F Change .032 .946 .828 .767 .711 .593 .302 .405 DurbinWatson 2.359 a. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Rent, Ratio_Vaca, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ, Ratio_One_ b. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Rent, Ratio_Vaca, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ c. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Vaca, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ d. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1980, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ e. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_1950, Ratio_Blac, Poverty_Wi, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ f. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_H_Ed, Ratio_Hisp, AVE_FAM_SZ g. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_1950, Ratio_Blac, Ratio_Hisp, AVE_FAM_SZ h. Predictors: (Constant), Poverty_Ra, Ratio_1980, Ratio_Blac, Ratio_Hisp, AVE_FAM_SZ i. Dependent Variable: Ln(Ratio_Pb_10+1) 99 Table 7.36: Coefficients of Northern Lake County Model using backward elimination method Coefficientsa Model 1 (Constant) Poverty_Ra Ratio_1980 Ratio_Blac Ratio_Hisp AVE_FAM_SZ Unstandardized Standardized Coefficients Coefficients B Std. Error Beta .218 1.551 -1.160 .615 -.248 -3.454 .736 -.664 .696 .402 .339 -1.242 .699 -.385 1.361 .408 .483 t .141 -1.888 -4.693 1.732 -1.778 3.340 Sig. .890 .074 .000 .100 .091 .003 Correlations Zero-order Partial .023 -.589 .340 -.112 .356 -.397 -.733 .369 -.378 .608 Part -.231 -.574 .212 -.217 .408 Collinearity Statistics Tolerance VIF .868 .747 .391 .319 .714 1.153 1.339 2.558 3.139 1.400 a. Dependent Variable: Ln(Ratio_Pb_10+1) 100 101 Figure 7.15: Distribution of dependent variable according to each independent variable in Northern Lake County 102 7.9 Clarksville, New Albany and Jeffersonville Model: Table 7.37: Test the Normal Distribution of residual of dependent variable in Clarksville, New Albany and Jeffersonville Model Tests of Normality a Unstandardized Residual Kolmogorov-Smirnov Statistic df Sig. .128 26 .200* Statistic .976 Shapiro-Wilk df 26 Sig. .768 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Figure 7.16: Histogram of residual of dependent variable in Clarksville, New Albany and Jeffersonville Table 7.38: Result of using the stepwise for selected tracts in Clarksville, New Albany and Jeffersonville Model Summary c Change Statistics Model 1 2 R .692a .771b R Square .479 .594 Adjusted R Square .457 .559 Std. Error of the Estimate ********** ********** R Square Change .479 .115 F Change 22.041 6.521 df1 df2 1 1 24 23 DurbinWatson Sig. F Change .000 .018 1.935 a. Predictors: (Constant), Ratio_H_Ed b. Predictors: (Constant), Ratio_H_Ed, Ratio_Vaca c. Dependent Variable: Ln(Ratio_Pb_10+1) Table 7.39: Coefficients of Clarksville, New Albany and Jeffersonville Model using stepwise method Coefficientsa Model 1 2 (Constant) Ratio_H_Ed (Constant) Ratio_H_Ed Ratio_Vaca Unstandardized Coefficients B Std. Error 6.102 1.060 -6.145 1.309 3.769 1.322 -4.215 1.401 12.216 4.784 Standardized Coefficients Beta -.692 -.475 .403 t 5.758 -4.695 2.851 -3.008 2.554 Sig. .000 .000 .009 .006 .018 Zero-order Correlations Partial Part Collinearity Statistics Tolerance VIF -.692 -.692 -.692 1.000 1.000 -.692 .659 -.531 .470 -.400 .339 .709 .709 1.410 1.410 a. Dependent Variable: Ln(Ratio_Pb_10+1) 103 Table 7.40: Result of using the backward elimination method for selected tracts in Clarksville, New Albany and Jeffersonville Model Summary h Change Statistics Model 1 2 3 4 5 6 7 R .885a .884b .884c .876d .870e .868f .845g R Square .783 .781 .781 .767 .757 .754 .715 Adjusted R Square .582 .609 .634 .636 .643 .658 .624 Std. Error of the Estimate ********** ********** ********** ********** ********** ********** ********** R Square Change .783 -.002 .000 -.013 -.010 -.003 -.039 F Change 3.899 .103 .013 .909 .703 .222 2.880 df1 df2 12 1 1 1 1 1 1 13 13 14 15 16 17 18 Sig. F Change .011 .753 .910 .355 .414 .644 .107 DurbinWatson 2.270 a. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_Hisp, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, Ratio_Rent, AVE_FAM_SZ, Ratio_One_, Poverty_Wi b. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_Hisp, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ, Ratio_One_, Poverty_Wi c. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_Hisp, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ, Poverty_Wi d. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Blac, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ, Poverty_Wi e. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, AVE_FAM_SZ, Poverty_Wi f. Predictors: (Constant), Poverty_Ra, Ratio_1980, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi g. Predictors: (Constant), Poverty_Ra, AVE_HH_SZ, Ratio_1950, Ratio_Vaca, Ratio_H_Ed, Poverty_Wi h. Dependent Variable: Ln(Ratio_Pb_10+1) 104 Table 7.41: Coefficients of Clarksville, New Albany and Jeffersonville Model using backward elimination method Coefficientsa Model 1 (Constant) Poverty_Ra AVE_HH_SZ Ratio_1950 Ratio_Vaca Ratio_H_Ed Poverty_Wi Unstandardized Coefficients B Std. Error 1.011 1.904 -4.734 2.450 1.133 .524 3.051 1.559 15.975 4.878 -4.620 1.715 8.966 4.362 Standardized Coefficients Beta -.950 .327 .287 .527 -.520 .917 t .531 -1.932 2.162 1.957 3.275 -2.695 2.056 Sig. .602 .068 .044 .065 .004 .014 .054 Zero-order .495 -.179 .431 .659 -.692 .450 Correlations Partial -.405 .444 .410 .601 -.526 .427 Part -.237 .265 .240 .401 -.330 .252 Collinearity Statistics Tolerance VIF .062 .658 .697 .580 .403 .076 16.107 1.520 1.434 1.724 2.481 13.234 a. Dependent Variable: Ln(Ratio_Pb_10+1) 105 106 Figure7.17: Distribution of dependent variable according to each independent variable In Clarksville, New Albany and Jeffersonville