RNR 416F-516F – Lab 8: Logistic Regression During lab six, you discovered that there was some clustering of the payday loan centers, although the reason for the clustering was less clear. You are wondering if it is evidence of targeting particular communities. To test this, you decide to create a regression model that you hope will answer the question. One drawback to this week’s tasks, your assistant, Channing, is unable to help you. After that little accident last week she is taking time off. Because of her laziness, you will be forced to do all the work yourself. Your resentment grows by the minute. Commands In order to complete your tasks, you will need to use a variety of tools. The following is a short list of tools you might find helpful for your work today. Remember to try to work through the help files before you ask for assistance. Tools Module Toolbox Toolbox Toolbox Toolbox Toolbox Copy Create Random Points Append Identity Join Field Processing Steps This lab will be handled in five sections. 1) data preparation; 2) logistic regression in SPSS; 3) making the regression model; 4) data analysis; and 5) story telling. Data Preparation In this section, you will examine the geodatabase and get it ready for the work you will be doing. Your goal is to have a point feature class that has the study and the control groups identified by ones and zeros, and contains all the demographic data from the tracts feature class. There are many ways to get to this point. You can find your own way, or you can follow the instructions below to do it as I did it. One thing you should note is that you will all get different results in this lab because you are all creating your own, unique random sample. 1. Add a field called pdl to the PLC feature class. You will use this field to differentiate between the study and the control groups. 2. Do the records in the PLC feature class represent the study or the control group? Based on your answer to this question, calculate the appropriate value into this field. If you are unsure, ask your neighbor. That way you can be unsure together. 3. Create a feature class of 1000 random points. Call the output feature class ran_sam. Make sure you set the constraining extent to tracts. This is a point where students often make a mistake, so before you continue make sure your new feature class actually has 1000 points 4. Add a field called pdl to the ran_sam feature class. You will use this field to differentiate between the study and the control groups. 5. Do the records in the ran_sam feature class represent the study or the control group? Based on your answer to this question, calculate the appropriate value into this field. If you are unsure, ask your neighbor. That way you can be wrong together. 6. For the regression, you will need to have the study and control groups in the same feature class. Copy PLC to a new feature class. Call this feature class regression_points. 7. Append ran_sam to regression_points. Make sure to set the schema type to no test. Check the table for regression_points. You should have 1071 records – 1000 zeros and 71 ones in the pdl field. 8. Although you were consumed with jealousy that Channing was able to lie around watching TV last night (Game of Thrones probably) you spent the evening getting the tracts ready to create a regression model. You used census attributes to create variables that match the Payday Loan Industry’s published strategy for locating PLC stores. Check the Tracts feature class to make sure that it contains the correct fields: Percent household income between $25K and $50K Percent households with children Percent female head of household Percent high school diploma only Percent age less than 45 Percent renter occupied housing You should also find empty fields ready to receive values weighted by regression coefficients, as well as a field called model, and one called prob_mod. 9. Remembering how much work it was to collect these data and create the appropriate fields, you decide to make a copy of this feature class – just in case something bad happens to it. Call it safety_tracts Gary L. Christopherson – Revised 10/28/2014 2 10. Finally, use an overlay operation to attach data from tracts to regression points. The output should be called regression_data. Logistic Regression in SPSS Now that your data is in good shape, you are ready to perform a logistic regression. You will use SPSS to do the regression. 1. In ArcMAP, export your table, regression_data, as a dbase file. Call it regression_data.dbf. 2. Open SPSS 3. In SPSS open regression_data.dbf. 4. To begin the regression, follow the menu to Analyze > Regression > Binary Logistic. 5. This should start a dialog where you will enter dependent and independent variables. Your dependent variable will be the field with the ones and zeros in it. The independent variables will be the six demographic data fields identified above. 6. Examine the results of the regression, particularly the R2 value and the significance of the coefficients. What do these results suggest about the relationship between payday loan centers and the socio economic variables used in the regression? 7. If you like, you should also feel free to experiment a bit. You might want to try some of the stepwise options to see if you can improve your model. 8. Once you are done playing, export the results of the regression to a Word document named regression_results.doc. Check to make sure your results were exported and are available to you, then close SPSS Making the Regression Model Based on the regression results, you are now ready to create the regression model in ArcGIS. Your goal is to apply the model created by your regression to the tracts feature class. This means a feature class with both unweighted and weighted variables, an attribute containing the predicted values of the model, and an attribute where those values have been scaled between zero and one. Feel free to arrive at this goal on your own terms, or follow the steps below and arrive at it on my terms. One thing that I want you to be sure to do – we are testing the public strategy of the payday loan industry, so even if you were able to improve on the model using stepwise regressions, please make the model using all six of their variables. Gary L. Christopherson – Revised 10/28/2014 3 1. Create weighted variables in the six weight fields by multiplying the unweighted fields by their corresponding regression coefficients. These coefficients are found in the results of the SPSS logistic regression. Mine looks like the following, but yours will look different because you used your own unique random sample points. In SPSS, the coefficients are called beta coefficients, and listed in the column headed B. Don’t forget to look in Excel for the complete coefficient value, not the truncated value found in the SPSS table. Variables in the Equation B Step 1a perc_hh_in .051 S.E. .021 perc_hh_w_ -.064 .016 15.154 1 .000 .938 perc_fem_h .136 .026 28.031 1 .000 1.146 perc_hs_di -.583 .221 6.936 1 .008 .558 perc_age_L .009 .019 .231 1 .631 1.009 perc_rente .036 .008 18.915 1 .000 1.037 -5.187 .735 49.845 1 .000 .006 Constant Wald 6.008 df 1 Sig. .014 Exp(B) 1.052 2. Use a calculator to correct the constant. Because the study group and control groups are different sizes, the constant must be corrected. The following equation n will correct the constant: ' ln 2 n , where is the Y-intercept in the 1 regression, ’ the corrected intercept, ln is a natural logarithm, n1 the number of cases in the smaller sample (the study group = 71) and n2 the number of cases in the larger sample (the control group = 1000). (Warren 1990) 3. To make the model, the weighted variables and the corrected constant are summed. The equation to create the model would be as follows: corrected constant + weighted variable 1 + weighted variable 2 + weighted variable 3 + weighted variable 4 + weighted variable 5 + weighted variable 6. Use the field calculator to perform this equation and place the results in the field called model. 4. The last field you need to fill in is the prob_mod field. The values in a probability model need to be scaled between 0 and 1. You will apply a logistic transformation to the values in the model by using the following equation in the Field Calculator: 1 / (1 + exp(- model)) (Kvamme 1988) Gary L. Christopherson – Revised 10/28/2014 4 Check to make sure the transformation was successful by examining the min and max values in prob_mod field – they should be very close to 0 and 1. Model Assessment 1. In order to test the efficiency of your model, you need to determine what percentage of the project area falls into the most likely designation, and how many of the PLC stores are contained in the most likely area. 2. First create a binary model based on values in the prob_mod field. Create a field call bin_mod. Then select all the records where values in the prob_mod field are >= to 0.5. Calculate bin_mod for these records = 1. All other values in bin_mod should be calculated to 0. 3. Next determine the percent of the area predicted by the model, that is all the polygons with a bin_mod value of 1. In the table, summarize on bin_mod, and sum the shape_area field. Add this table to TOC in your map Remember that because document; open it and look at the some of the tracts are out of fields. Using the values in the table the model because of null calculate the percent of the total area values, you cannot use the that the model is telling you “is a shape area in clippit to good place to locate PLC stores.” calculate percent area. 4. Next determine the percent of PLC stores that are found in the polygons predicted to contain PLC stores. To do this, use Identity to attach values from Tracts to the features in PLC. Call the output PLC_bin. In this new feature class, use the values in the bin_mod field to calculate a percentage of stores contained in polygons predicted to contain PLC stores. 5. Finally, to determine the efficiency of your model, plug the appropriate percentages into this equation: percentage of total area within most likely category efficiency 1 percentage of total sites within most likely category (Kvamme 1988) Include this equation and its result in your PowerPoint document (below). Based on this equation, do you think there is a strong, moderate, or weak relationship between these stores and the variables identified by the PDL industry? Channing Alert!! Just before you started creating a presentation to explain the results of your analysis, you heard your email alert and found this message from Channing, your assistant. Gary L. Christopherson – Revised 10/28/2014 5 Dear Boss, I have been lying here for a week now. It is incredibly boring, and I am having a very difficult time with the whole process. Eating through a straw, not being able to move, and I have had a tormenting itch on my nose for the last five days and no way to scratch it. Thankfully, there is a computer monitor on the ceiling, and somebody to help me use it (hi, my name is doug, and I’m channing’s assistant. I’m typing this email as she dictates it). I’ve been reading PDFs on the monitor about the Payday Loan Industry and ran across one that suggests that their stores are located by targeting non-white residents and high-density population. I thought you might want to run a regression to test this so I’ve created a table that will allow you to calculate non-white populations in Tucson. I’ve attached the file to this email. Let me know how it goes. Right now Doug, my assistant, is telling me that the hand-truck is on the way so he can take me for a walk. When we get back I’ll check my email to find out how things went. Good luck, Your assistant, Channing You don’t know if this is good, or bad, but you do know that you will be spending more time doing regressions. One good thing, since you have done this before it should go much quicker this time. 1. Find Channing’s table and open it in ArcGIS. 2. Add a new double precision field called perc_non_white. Using the variables in the table calculate percent non-white for Tucson tracts. 3. Use the Join Field tool to add perc_non_white to the tracts feature class. 4. In the tracts feature class, add a double precision field called pop_density. Use the field calculator to calculate population per square kilometer. 5. Add additional double precision fields called wght_perc_nw, wght_pop_dens, new_model, new_prob_mod; and a short integer field called new_bin_mod. 6. Use Identity to attach tract variables to the PLC feature class. Call the results regression_data_2 and export the table so you can use it in SPSS. 7. Perform a logistic regression using just perc_non_white and pop_density. Be sure to export the results of the regression Gary L. Christopherson – Revised 10/28/2014 6 8. Note the R2 value, and create a predictive model using the coefficients from the regression. 9. Using the new_prob_mod, determine the ones and zeros for bin-mod. 10. Calculate percent area and percent of stores that fall in the model’s most likely designation. 11. Calculate model efficiency. Data Analysis/Story Telling 1. What do you think this regression means in terms of our question, “Are PLC stores targeting particular communities?” 2. Make a PowerPoint that tells your story of PLC stores in Tucson and the payday loan industry. Deliverables Just after you finished your presentation, you were cc’d on an email that Channing’s assistant, Doug, sent to Channing. In it he apologized for not being able to help her use the ceiling monitor to read today. He is not feeling well. Doug sent this video by way of explanation. https://www.youtube.com/watch?v=X8Rh3gQEtLE Based on this video, you are struck with how perfect they are for each other, and feel certain that a wedding is in their future. This lab produces several deliverables: A GDB called lab_8.gdb – containing one feature dataset containing all the feature classes you used and created in this lab. Lab_8.mxd that includes the feature classes and tables necessary to answer the question. A PowerPoint presentation that tells your story about how the Payday Loan Industry does or does not target specific communities References Cited Kvamme, Kenneth L. 1988 Development and Testing of Quantitative Models. In Quantifying the Present and Predicting the Past: Theory, Method, and Application of Archaeological Predictive Modeling. W.J. Judge and L. Sebastian, eds. Pp. 325-428. Washington, D. C.: U.S. Government Printing Office. Gary L. Christopherson – Revised 10/28/2014 7 Warren, Robert E. 1990 Predictive Modeling of Archaeological Site Location: A Case Study in the Midwest. In Interpreting Space: GIS and Archaeology. K.M.S. Allen, S.W. Green, and E.B.W. Zubrow, eds. Pp. 201-215. London: Taylor and Francis. Gary L. Christopherson – Revised 10/28/2014 8