Attribution Key for more information see: http://open.umich.edu/wiki/AttributionPolicy Use + Share + Adapt { Content the copyright holder, author, or law permits you to use, share and adapt. } Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105) Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Creative Commons – Zero Waiver Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Make Your Own Assessment { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ { Content Open.Michigan has used under a Fair Use determination. } Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. Project:Current events:A2 DD:FocusHOPE Focus: HOPE: Since its founding in 1968, Focus: HOPE (http://www.focushope.edu/Default.aspx) has gained national renown with its work improving the lives of all residents of Detroit, regardless of race, economic status, national origin or religious persuasion. They have been very active with their food program, career training programs, and their HOPE Village initiative. Below add discussion, final products, drafts, follow-up discussion, etc. related to any and all work and projects conducted during the Datadive. Friday night presentation Sunday Final presentation Description of the Data Link to Focus Hope Dropbox director (https://www.dropbox.com/s/1b12m21omydb0uy) with data Candidate Addresses - Center for Advanced Technologies program, dummy var: Candidate = 1 Current Students Addresses 2011 CAT - Center for Advanced Technologies, dummy var: CAT = 1 EL1 Addresses - Earn and Learn – cohort 1, dummy var: EL1 = 1 EL2 Addresses - Earn and Learn – cohort 2, dummy var: EL2 = 1 HVIBoundaryContactList HOPE Village Initiative – resident touched by an HVI program, dummy var: HVI = 1 SBA Data_10.01.11 - Sustainable Broadband Adopter (Connect Your Communities Program), dummy var: SBA = 1 SBA Data_10.01.11 (sheet 2) - Sustainable Broadband Adopter (Connect Your Communities Program), dummy var: SBA2 = 1 WiMaxSBAs_10.01.11 - Sustainable Broadband Adopter with WiMax Modem (Connect Your Communities Program), dummy var: SBAWiMax = 1 Center for Working Families dummy var: CFWF = 1 CrimeJul2011Feb2012 - Crime data for one mile radius around Focus Hope's Address dummy vars for type of crime committed Data from Data Driven Detroit: Neighborhood Amenities = GIS files showing the location of: Colleges and Universities, Fire Stations, Historic Districts, Historic Sites, Libraries, Medical Facilities, City Halls, Other Schools, Parks, Police Stations, Public Schools, Shopping Centers ACS_Data American Community Survey Data for the Focus: HOPE area, 2006 - 2010. See the GUIDE_DOCS file within this file for more details of what the various census codes mean (could be used to compare Focus: HOPE area to rest of Detroit or MI NOTE: They have way more data that they have provided (including census shape files, Detroit City Budget, local restaurants, and location of Detroit's Alternative Food Access Programs) it just wouldn't all fit in the dropbox - See Data Ambassadors for full list Google Doc Guide to Focus:HOPE data sources: (https://docs.google.com/a/umich.edu/spreadsheet/ccc?key=0AknN2a_xojvP dGVlUHRnZ295MmlWVVduTWpnQnJ6cHc#gid=0) Topics to analyze 1. Participant data Tom Peppard (Geolocation team?) 2. Demographic profiles for HOPE Village Initiative Mikko Tuomela - can also help with visualizing results Drew Tom Sastastic Whitney 3. Transit/access/safety (safe routes to school, traffic safety problems, W. Davison) James Warila Chad (Contact information of the data scientists who worked on the Focus Hope project on Saturday) Material for A2 Data Dive FocusHOPE Brainstorming Google Doc (https://docs.google.com/a/umich.edu/document/d/1qa0Jf4BDy7rJDwZ5iupxb ClBlc8Vz69bKgvD0gu4g_4/edit?pli=1#heading=h.i638lwx8hvw8) Map outlining Hope Village Initiative Boundaries (https://maps.google.com/maps/ms?msid=212886075492773777177.0004b8 b337ab96aa83194&msa=0&ll=42.398742,-83.121099) Questions General What can we say about the participants of the various Focus: HOPE programs geographically? Are participants attending multiple programs? What can we say about the neighborhood of the Hope Village Initiative? in relation to the rest of Detroit? or MI? Who are we impacting currently and how can we use that information to impact others? Census data 1. Economy and self-sufficiency employment value of homes income 2. Education 3. Environment vacancy occupancy who has moved out Additionally: What is a typical child's experience? What is a typical senior's experience? Data Prep Creating Unique IDs for program participants: MASTER Participant ID.xlsx contains a sorted list of all the participants from the various program spreadsheets (*For the privacy of the participants this will not be distributed). The following instructions will walk you through the process of eliminating the repeated participants using Excel. The purpose of creating UniqueIDs across all of the spreadsheets is to see if we can identify participants who attended multiple programs Once I had all the addresses together (which included repeated addresses for participants who attended multiple programs), in Excel I selected "Data," "Advanced Filter" and a window pops up. Under "Action" I select "Copy to another location". Then for "List Range" I select all of the addresses (including repeats); next under "Copy To" I select a cell that is not in the column where my data is coming from. Lastly, I check the box that says "Unique records only." Then I copied the list of unique address records to cell A2 (cell A1 was titled "Addresses"). Cell B2 was labeled UniqueID. I started the IDs at 100. Excel allows you to increment number values by pulling across an autofill formula (for more information click here: http://spreadsheets.about.com/od/a/g/autofill_def.htm) Once I had built the unique ids, I could go back to the individual program participant workbooks and add in the unique id to each record using Excel's INDEX and MATCH function. I use MATCH to identify *where* on the spreadsheet is the address I want to match and I use INDEX to return the value of the matching unique id. In each workbook I created a UniqueID column and entered this formula to help me match the ids to the address: =INDEX('[MASTER Participant ID.xlsx]Sheet1'!$A$1:$B$1291,MATCH($B2,'[MASTER Participant ID.xlsx]Sheet1'!$A$1:$A$1291,0),2) After the UniqueIDs were copied over using this formula, I selected the cells with this formula in them, copied them (Ctrl-C), and then in the same place "Paste Special" (under Edit), and selected "Values" This makes the formula disappear, makes the worksheet run faster (since it isn't looking in another workbook for information), and now the IDs are hard coded. Some limitations of this approach include the fact that since this unique id was based off of addresses and not names, some participants that were listed at the same address (ie an apartment complex) might have the same unique id. For the purposes of mapping this shouldn't be a big deal, but it may overstate other calculations. Anonymizing Participant Addresses To protect the privacy of the participants, and to more closely match data from Crimemapping.com, I used Excel formulas again to change everything but the first two digits of an address to zeros. We call get Excel to give us the numbers in an address (5 digits, 3 digits, etc), by finding the space in the address field starting from the left - and subtracting one: =LEFT(A2,FIND(" ",A2)-1) With those numbers we can trailing zeros with both a call to the REPLACE function and the REPT (repeat) function. Given the the length of the number, and the fact that we want to keep the first two in place, for the remaining digits replace them with zeros: =REPLACE(C3,3,LEN(C3)-2,REPT("0",LEN(C3)-2)) For numbers that had 3 digits or less, we replaced everything but the first digit: =REPLACE(C3,2,LEN(C3)-1,REPT("0",LEN(C3)-1) Putting them together in the same cell formula gives you: =IF(LEN(C3)>3,REPLACE(C3,3,LEN(C3)-2,REPT("0",LEN(C3)2)),REPLACE(C3,2,LEN(C3)-1,REPT("0",LEN(C3)-1))) To then put the entire address together you can use the CONCATENATE function (keeping in mind that not all of the parts will be in the same place as this formula). Essentially what it does is add each argument (cell) together in the order that you list them. In the following function, I combine CONCATENATE with the REPLACE function to update the new address numbers. =CONCATENATE(D2, (REPLACE(A2,1,LEN(C2),"")), ", ",B2) Data Processing Geo-coding address data File: geocoded_addresses2.csv [ADD LINK] We used Stata for the following procedures. Merging We merged the following files: Candidate Address (cleaned) Center for Working Families (cleaned) Current Students Addresses 2011 CAT (cleaned) EL1 Addresses (cleaned) EL2 Addresses (cleaned) HVIBoundaryContactList (cleaned) SBA Data_10.01.11 (cleaned) SBA Data_10.01.11 (sheet 2) (cleaned) WiMaxSBAs_10.01.11 (cleaned) In the merged data we added a field to indicate which file the address came from. The files are mapped as follows: Candidate Addresses - CAT Candidates Center for Working Families - CFWF Current Students Addresses 2011 CAT - CAT Current Students EL1 Addresses - El1 Cohort 1 EL2 Addresses - EL2 Cohort 2 HVIBoundaryContactList - HVI Touched Resident SBA Data_10.01.11 - SBA SBA Data_10.01.11 (sheet 2) - SBA WiMaxSBAs_10.01.11 - SBA WiMax Duplicates We found multiple duplicate uniqueIDs (for example, 669 in file EL2 Addresses). The uniqueIDs are unique to each address. Each occurrence represents an individual, so duplicates may be multiple people at the same address. Cleaning To clean the data for geo-coding, we did the following: Dropped cases where the uniqueID was missing or the address was missing. Cleaned the fields to get rid of leading blanks. Converted everything to uppercase (it helps to have everything in the same format). Remove bad characters (e.g. Ê Ê, `) manually. Geo-coding To get latitude and longitude for each address, we used a function in Stata that makes a call to Google Maps Our output has the following: address number of people at that address geocode (Google Maps status code -- e.g. 200 = no errors) geoscore (Google Maps accuracy level -- e.g. 8 = street-level accuracy) we dropped addresses that had a geoscore less than 8 (meaning that the addresses were higher than street-level accuracy according to Google's output) latitude longitude Mapping the address data We used the program MapInfo and the geocoded address output from the above section to visualize the geo-spatial data. We used the following files from Data Driven Detroit to map the area boundaries: A2D2_Area_Boundary.dbf A2D2_Area_Boundary.prj A2D2_Area_Boundary.sbn A2D2_Area_Boundary.sbx A2D2_Area_Boundary.shp A2D2_Area_Boundary.shp.xml A2D2_Area_Boundary.shx We had trouble mapping the individual programs (e.g. CFWF) from the merged file, so we split the programs into separate csv files and imported those to MapInfo. Fusion Table Layers HVI Boundary Contact List (https://www.google.com/fusiontables/DataSource?snapid=S3939059A2d) EL1 Addresses (https://www.google.com/fusiontables/DataSource?snapid=S3939 062G9p) EL2 Addresses (https://www.google.com/fusiontables/DataSource?snapid=S393907lmuB) Detroit Parcels (https://www.google.com/fusiontables/DataSource?snapid=S393908X_KS) FH HVI Boundary (https://www.google.com/fusiontables/DataSource?snapid=S393910m5fx) Census Demo1 + ACS Block Group data (https://www.google.com/fusiontables/DataSource?snapid=S393911b1cz) All Focus:HOPE Program Data (https://www.google.com/fusiontables/DataSource?snapid=S394189R8Ih) ACS Demographics for Detroit – Tract (https://www.google.com/fusiontables/DataSource?snapid=S394191Z5mq) Drew did a lot of work using the Google Fusion Table Layer Wizard to add additional layers to the participant data map. In the Dropbox Folder he use the following files: "web_demo_files" folder - contains the html file, which pulls in the information from the Google Fusion tables to build the interactive map "acsFocusHopeDictionary.csv" and "acsFocusHopeEstimates" files which are the median income data, which the Layer Wizard pulled in "combined_program_participant_files" folder - which has the combined spreadsheets of the participant data, which identifies the points on the map with the various programs There is an interactive web site that allows these data to be viewed interactively: http://projects.datadrivendetroit.org/FHdatadive/ Findings and Output it appears that there are a substantial number of participants in student programs who live outside of the HVI area; it seems that there are about 1/3 of the participants who have graduated from HS. There are high rates of poverty. [n] For the area between Davison from Dexter to Rosa Parks (data from 2006-2010; note that the numbers are cumulative: 21 accidents in 5 years over a one mile stretch of road): Traffic accidents were for some reason highest (unusually so) in the middle of the day and in October; also car-pedestrian accidents are high, while car-car is nil; traffic injuries seem to be highest (at least per reports available) between 1-2pm; mortality from these accidents is low (3), but injuries are high (18) relative to the # of accidents. This may be useful for info related to school outreach if pedestrian access is necessary for children; this is one of the areas of highest car/ped accidents in the Detroit area. Notably, it appears that 2009-2010 experienced lower rates of accidents than did prior years. Data have been uploaded to dropbox. A visit to the neighborhood found residents using the pedestrian crossings on Davison Street to access the supermarket. Also, there is a newer-looking buttonactivated pedestrian crossing system that provides a countdown timer for the period prior to "don't walk." Is it possible that this new system could explain the drop in accidents in 2009-2010? SEMCOG Traffic Data and Intersection Crash Frequency (http://www.semcog.org/Data/Maps/roads.map.cfm) Another observation from the map of participant data, related to the issues of traffic accessibility, is that many more program participants are clustered in the HVI neighborhood south of Oakman, while there are many fewer participants in the region north of Oakman. See the map on Data Driven Detroit (http://projects.datadrivendetroit.org/FHdatadive/). One potential explanation (or question to ask) is that/whether the limited number of streets crossing the industrial zone north of Focus:HOPE makes it difficult for residents to access the facility on Oakman. Many of the companies are fenced in this industrial strip. In combination with large fenced facilities such as the former Malcom X/Robeson Academy, which occupies large acreage to the north of facilities, the number of pathways from the neighborhood in the north of the HVI area to Focus:HOPE small. Facilities for children, such as the Ben Hill and Salsingar playgrounds, don't appear to have a direct pedestrian route to Focus:HOPE. This hypothesis would suggest that the HOPE Initiative's objective of "Board Up/Clean Up/Neigbhorhood Beautification" project could be served by projects that increase the pedestrian/bike accessibility of the northern HVI to Focus:HOPE and the Davison Corridor. Also, on numerous residential streets throughout the HVI, there appears to have been illegal dumping of tires, furniture, and other debris. Could these be barriers to walkability? One additional follow up. Using the "Smart Street Walk Score" on walkscore.com, it appears that most of the HVI area south of Focus:HOPE can reach the facility within a 15 minute walk. The northern HVI area (surrounding Fenkell) cannot reach Focus:HOPE within a 15 minute walk. Presentations You can find presentations from the Focus HOPE data divers at: A2DataDive: FocusHOPE Final Presentation (http://www.slideshare.net/openmichigan/a2datadive-focus-hope-final-presentation). Special thanks to Nikki for putting this together. Questions for Further Research How do we target programs towards population returning from incarceration? How to quantify how people are getting involved in improving their own neighborhoods? Who are the people in the geo area and/or in the schools that are NOT being served by the Focus programs? (If these people live in the same neighborhoods, this may or may not be a target group for future allocation of resources) What are the social, physical, cultural barriers to participation in Focus programs for residents in these identified neighborhoods? Along this line, we noticed on the map that there is a nice saturation to the areas just south of the center, where people are accessing svcs, but there is less saturation to the north of the center; tried to call Debbie - no answer; wondered why people to the north are not accessing svcs- is it that they don't need them? that they aren't aware? that they can't access for some reason? can't answer these questions without further guidance from Focus/Debbie. How do HVI participants in Focus:HOPE travel to the facility? Are there barriers, such as traffic safety or limited "walkability" (e.g., limited north/south crossings of the industrial area near Focus:HOPE; illegal dumping on residential streets) that prevent people from making use Focus:HOPE or other neighborhood attributes? Recommendations for Future Data Collection Future participant sheets should have Apartment number (etc) in separate field Possibly set up a Microsoft Access Database Access could handle unique IDs for you and match people across programs. You can also set up a data entry interface (not ideal), which would allow you to by-pass having to use a lot of SQL queries. Take Advantage of Free Tools (Google Fusion Tables) You can quickly create a map from a spreadsheet using Google Fusion Tables (https://www.google.com/fusiontables/Home/). First, you have to save the spreadsheet as a CSV (comma-separated value) file. You can do this through the "save as" command in Excel. Then, go to Google Fusion Tables (https://www.google.com/fusiontables/Home/) and click the create "Create" button. Select "Table" from the list. Upload the spreadsheet. An in-depth tutorial can be found at: http://blog.apps.chicagotribune.com/2010/03/04/quickly-visualize-and-map-adata-set-using-google-fusion-tables/