2012-project-current_events-a2-dd-focushope

advertisement
Attribution Key
for more information see: http://open.umich.edu/wiki/AttributionPolicy
Use + Share + Adapt
{ Content the copyright holder, author, or law permits you to use, share and adapt. }
Public Domain – Government: Works that are produced by the U.S. Government. (17 USC §
105)
Public Domain – Expired: Works that are no longer protected due to an expired copyright term.
Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain.
Creative Commons – Zero Waiver
Creative Commons – Attribution License
Creative Commons – Attribution Share Alike License
Creative Commons – Attribution Noncommercial License
Creative Commons – Attribution Noncommercial Share Alike License
GNU – Free Documentation License
Make Your Own Assessment
{ Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. }
Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in
your jurisdiction may differ
{ Content Open.Michigan has used under a Fair Use determination. }
Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your
jurisdiction may differ
Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that
your use of the content is Fair.
To use this content you should do your own independent analysis to determine whether or not your use will be Fair.
Project:Current events:A2 DD:FocusHOPE
Focus: HOPE: Since its founding in 1968, Focus: HOPE
(http://www.focushope.edu/Default.aspx) has gained national renown with its
work improving the lives of all residents of Detroit, regardless of race, economic
status, national origin or religious persuasion. They have been very active with
their food program, career training programs, and their HOPE Village initiative.
Below add discussion, final products, drafts, follow-up discussion, etc. related to
any and all work and projects conducted during the Datadive.


Friday night presentation
Sunday Final presentation
Description of the Data
Link to Focus Hope Dropbox director
(https://www.dropbox.com/s/1b12m21omydb0uy) with data

Candidate Addresses - Center for Advanced Technologies program, dummy
var: Candidate = 1

Current Students Addresses 2011 CAT - Center for Advanced
Technologies, dummy var: CAT = 1

EL1 Addresses - Earn and Learn – cohort 1, dummy var: EL1 = 1

EL2 Addresses - Earn and Learn – cohort 2, dummy var: EL2 = 1

HVIBoundaryContactList HOPE Village Initiative – resident touched by an HVI
program, dummy var: HVI = 1

SBA Data_10.01.11 - Sustainable Broadband Adopter (Connect Your
Communities Program), dummy var: SBA = 1

SBA Data_10.01.11 (sheet 2) - Sustainable Broadband Adopter (Connect
Your Communities Program), dummy var: SBA2 = 1

WiMaxSBAs_10.01.11 - Sustainable Broadband Adopter with WiMax Modem
(Connect Your Communities Program), dummy var: SBAWiMax = 1

Center for Working Families dummy var: CFWF = 1

CrimeJul2011Feb2012 - Crime data for one mile radius around Focus Hope's
Address dummy vars for type of crime committed

Data from Data Driven Detroit:


Neighborhood Amenities = GIS files showing the location of:

Colleges and Universities,

Fire Stations,

Historic Districts,

Historic Sites,

Libraries,

Medical Facilities,

City Halls,

Other Schools,

Parks,

Police Stations,

Public Schools,

Shopping Centers

ACS_Data American Community Survey Data for the Focus: HOPE area,
2006 - 2010. See the GUIDE_DOCS file within this file for more details of
what the various census codes mean (could be used to compare Focus:
HOPE area to rest of Detroit or MI

NOTE: They have way more data that they have provided (including
census shape files, Detroit City Budget, local restaurants, and location of
Detroit's Alternative Food Access Programs) it just wouldn't all fit in the
dropbox - See Data Ambassadors for full list
Google Doc Guide to Focus:HOPE data sources:
(https://docs.google.com/a/umich.edu/spreadsheet/ccc?key=0AknN2a_xojvP
dGVlUHRnZ295MmlWVVduTWpnQnJ6cHc#gid=0)
Topics to analyze
1. Participant data

Tom Peppard

(Geolocation team?)
2. Demographic profiles for HOPE Village Initiative

Mikko Tuomela - can also help with visualizing results

Drew

Tom Sastastic

Whitney
3. Transit/access/safety (safe routes to school, traffic safety problems, W.
Davison)

James Warila

Chad
(Contact information of the data scientists who worked on the Focus Hope
project on Saturday)
Material for A2 Data Dive

FocusHOPE Brainstorming Google Doc
(https://docs.google.com/a/umich.edu/document/d/1qa0Jf4BDy7rJDwZ5iupxb
ClBlc8Vz69bKgvD0gu4g_4/edit?pli=1#heading=h.i638lwx8hvw8)

Map outlining Hope Village Initiative Boundaries
(https://maps.google.com/maps/ms?msid=212886075492773777177.0004b8
b337ab96aa83194&msa=0&ll=42.398742,-83.121099)
Questions
General

What can we say about the participants of the various Focus: HOPE
programs geographically?

Are participants attending multiple programs?

What can we say about the neighborhood of the Hope Village Initiative? in
relation to the rest of Detroit? or MI?

Who are we impacting currently and how can we use that information to
impact others?
Census data
1. Economy and self-sufficiency

employment

value of homes

income
2. Education
3. Environment

vacancy

occupancy

who has moved out
Additionally: What is a typical child's experience? What is a typical senior's
experience?
Data Prep
Creating Unique IDs for program participants:
MASTER Participant ID.xlsx contains a sorted list of all the participants from the
various program spreadsheets (*For the privacy of the participants this will not be
distributed). The following instructions will walk you through the process of
eliminating the repeated participants using Excel. The purpose of creating
UniqueIDs across all of the spreadsheets is to see if we can identify participants
who attended multiple programs
Once I had all the addresses together (which included repeated addresses for
participants who attended multiple programs), in Excel I selected "Data,"
"Advanced Filter" and a window pops up. Under "Action" I select "Copy to
another location". Then for "List Range" I select all of the addresses (including
repeats); next under "Copy To" I select a cell that is not in the column where my
data is coming from. Lastly, I check the box that says "Unique records only."
Then I copied the list of unique address records to cell A2 (cell A1 was titled
"Addresses"). Cell B2 was labeled UniqueID. I started the IDs at 100. Excel
allows you to increment number values by pulling across an autofill formula (for
more information click here:
http://spreadsheets.about.com/od/a/g/autofill_def.htm)
Once I had built the unique ids, I could go back to the individual program
participant workbooks and add in the unique id to each record using Excel's
INDEX and MATCH function. I use MATCH to identify *where* on the
spreadsheet is the address I want to match and I use INDEX to return the value
of the matching unique id. In each workbook I created a UniqueID column and
entered this formula to help me match the ids to the address: =INDEX('[MASTER
Participant ID.xlsx]Sheet1'!$A$1:$B$1291,MATCH($B2,'[MASTER Participant
ID.xlsx]Sheet1'!$A$1:$A$1291,0),2)
After the UniqueIDs were copied over using this formula, I selected the cells with
this formula in them, copied them (Ctrl-C), and then in the same place "Paste
Special" (under Edit), and selected "Values" This makes the formula disappear,
makes the worksheet run faster (since it isn't looking in another workbook for
information), and now the IDs are hard coded.
Some limitations of this approach include the fact that since this unique id was
based off of addresses and not names, some participants that were listed at the
same address (ie an apartment complex) might have the same unique id. For the
purposes of mapping this shouldn't be a big deal, but it may overstate other
calculations.
Anonymizing Participant Addresses
To protect the privacy of the participants, and to more closely match data from
Crimemapping.com, I used Excel formulas again to change everything but the
first two digits of an address to zeros. We call get Excel to give us the numbers in
an address (5 digits, 3 digits, etc), by finding the space in the address field starting from the left - and subtracting one: =LEFT(A2,FIND(" ",A2)-1)
With those numbers we can trailing zeros with both a call to the REPLACE
function and the REPT (repeat) function. Given the the length of the number, and
the fact that we want to keep the first two in place, for the remaining digits
replace them with zeros: =REPLACE(C3,3,LEN(C3)-2,REPT("0",LEN(C3)-2))
For numbers that had 3 digits or less, we replaced everything but the first digit:
=REPLACE(C3,2,LEN(C3)-1,REPT("0",LEN(C3)-1)
Putting them together in the same cell formula gives you:
=IF(LEN(C3)>3,REPLACE(C3,3,LEN(C3)-2,REPT("0",LEN(C3)2)),REPLACE(C3,2,LEN(C3)-1,REPT("0",LEN(C3)-1)))
To then put the entire address together you can use the CONCATENATE
function (keeping in mind that not all of the parts will be in the same place as this
formula). Essentially what it does is add each argument (cell) together in the
order that you list them. In the following function, I combine CONCATENATE with
the REPLACE function to update the new address numbers.
=CONCATENATE(D2, (REPLACE(A2,1,LEN(C2),"")), ", ",B2)
Data Processing
Geo-coding address data
File: geocoded_addresses2.csv [ADD LINK]
We used Stata for the following procedures.
Merging
We merged the following files:

Candidate Address (cleaned)

Center for Working Families (cleaned)

Current Students Addresses 2011 CAT (cleaned)

EL1 Addresses (cleaned)

EL2 Addresses (cleaned)

HVIBoundaryContactList (cleaned)

SBA Data_10.01.11 (cleaned)

SBA Data_10.01.11 (sheet 2) (cleaned)

WiMaxSBAs_10.01.11 (cleaned)
In the merged data we added a field to indicate which file the address came from.
The files are mapped as follows:

Candidate Addresses - CAT Candidates

Center for Working Families - CFWF

Current Students Addresses 2011 CAT - CAT Current Students

EL1 Addresses - El1 Cohort 1

EL2 Addresses - EL2 Cohort 2

HVIBoundaryContactList - HVI Touched Resident

SBA Data_10.01.11 - SBA

SBA Data_10.01.11 (sheet 2) - SBA

WiMaxSBAs_10.01.11 - SBA WiMax
Duplicates
We found multiple duplicate uniqueIDs (for example, 669 in file EL2 Addresses).
The uniqueIDs are unique to each address. Each occurrence represents an
individual, so duplicates may be multiple people at the same address.
Cleaning
To clean the data for geo-coding, we did the following:

Dropped cases where the uniqueID was missing or the address was missing.

Cleaned the fields to get rid of leading blanks.

Converted everything to uppercase (it helps to have everything in the same
format).

Remove bad characters (e.g. Ê Ê, `) manually.
Geo-coding
To get latitude and longitude for each address, we used a function in Stata that
makes a call to Google Maps
Our output has the following:

address

number of people at that address

geocode (Google Maps status code -- e.g. 200 = no errors)

geoscore (Google Maps accuracy level -- e.g. 8 = street-level accuracy)

we dropped addresses that had a geoscore less than 8 (meaning that the
addresses were higher than street-level accuracy according to Google's
output)

latitude

longitude
Mapping the address data
We used the program MapInfo and the geocoded address output from the above
section to visualize the geo-spatial data.
We used the following files from Data Driven Detroit to map the area boundaries:

A2D2_Area_Boundary.dbf

A2D2_Area_Boundary.prj

A2D2_Area_Boundary.sbn

A2D2_Area_Boundary.sbx

A2D2_Area_Boundary.shp

A2D2_Area_Boundary.shp.xml

A2D2_Area_Boundary.shx
We had trouble mapping the individual programs (e.g. CFWF) from the merged
file, so we split the programs into separate csv files and imported those to
MapInfo.
Fusion Table Layers

HVI Boundary Contact List
(https://www.google.com/fusiontables/DataSource?snapid=S3939059A2d)

EL1
Addresses (https://www.google.com/fusiontables/DataSource?snapid=S3939
062G9p)

EL2 Addresses
(https://www.google.com/fusiontables/DataSource?snapid=S393907lmuB)

Detroit Parcels
(https://www.google.com/fusiontables/DataSource?snapid=S393908X_KS)

FH HVI Boundary
(https://www.google.com/fusiontables/DataSource?snapid=S393910m5fx)

Census Demo1 + ACS Block Group data
(https://www.google.com/fusiontables/DataSource?snapid=S393911b1cz)

All Focus:HOPE Program Data
(https://www.google.com/fusiontables/DataSource?snapid=S394189R8Ih)

ACS Demographics for Detroit – Tract
(https://www.google.com/fusiontables/DataSource?snapid=S394191Z5mq)
Drew did a lot of work using the Google Fusion Table Layer Wizard to add
additional layers to the participant data map.
In the Dropbox Folder he use the following files:

"web_demo_files" folder - contains the html file, which pulls in the
information from the Google Fusion tables to build the interactive map

"acsFocusHopeDictionary.csv" and "acsFocusHopeEstimates" files which are the median income data, which the Layer Wizard pulled in

"combined_program_participant_files" folder - which has the combined
spreadsheets of the participant data, which identifies the points on the map
with the various programs
There is an interactive web site that allows these data to be viewed
interactively: http://projects.datadrivendetroit.org/FHdatadive/
Findings and Output
it appears that there are a substantial number of participants in student programs
who live outside of the HVI area; it seems that there are about 1/3 of the
participants who have graduated from HS. There are high rates of poverty. [n]
For the area between Davison from Dexter to Rosa Parks (data from 2006-2010;
note that the numbers are cumulative: 21 accidents in 5 years over a one mile
stretch of road): Traffic accidents were for some reason highest (unusually so) in
the middle of the day and in October; also car-pedestrian accidents are high,
while car-car is nil; traffic injuries seem to be highest (at least per reports
available) between 1-2pm; mortality from these accidents is low (3), but injuries
are high (18) relative to the # of accidents. This may be useful for info related to
school outreach if pedestrian access is necessary for children; this is one of the
areas of highest car/ped accidents in the Detroit area. Notably, it appears that
2009-2010 experienced lower rates of accidents than did prior years. Data have
been uploaded to dropbox.
A visit to the neighborhood found residents using the pedestrian crossings on
Davison Street to access the supermarket. Also, there is a newer-looking buttonactivated pedestrian crossing system that provides a countdown timer for the
period prior to "don't walk." Is it possible that this new system could explain the
drop in accidents in 2009-2010?
SEMCOG Traffic Data and Intersection Crash Frequency
(http://www.semcog.org/Data/Maps/roads.map.cfm)
Another observation from the map of participant data, related to the issues of
traffic accessibility, is that many more program participants are clustered in the
HVI neighborhood south of Oakman, while there are many fewer participants in
the region north of Oakman. See the map on Data Driven Detroit
(http://projects.datadrivendetroit.org/FHdatadive/). One potential explanation (or
question to ask) is that/whether the limited number of streets crossing the
industrial zone north of Focus:HOPE makes it difficult for residents to access the
facility on Oakman. Many of the companies are fenced in this industrial strip. In
combination with large fenced facilities such as the former Malcom X/Robeson
Academy, which occupies large acreage to the north of facilities, the number of
pathways from the neighborhood in the north of the HVI area to Focus:HOPE
small. Facilities for children, such as the Ben Hill and Salsingar playgrounds,
don't appear to have a direct pedestrian route to Focus:HOPE. This hypothesis
would suggest that the HOPE Initiative's objective of "Board Up/Clean
Up/Neigbhorhood Beautification" project could be served by projects that
increase the pedestrian/bike accessibility of the northern HVI to Focus:HOPE
and the Davison Corridor. Also, on numerous residential streets throughout the
HVI, there appears to have been illegal dumping of tires, furniture, and other
debris. Could these be barriers to walkability?
One additional follow up. Using the "Smart Street Walk Score" on
walkscore.com, it appears that most of the HVI area south of Focus:HOPE can
reach the facility within a 15 minute walk. The northern HVI area (surrounding
Fenkell) cannot reach Focus:HOPE within a 15 minute walk.
Presentations
You can find presentations from the Focus HOPE data divers at: A2DataDive:
FocusHOPE Final Presentation (http://www.slideshare.net/openmichigan/a2datadive-focus-hope-final-presentation). Special thanks to Nikki for putting this
together.
Questions for Further Research

How do we target programs towards population returning from incarceration?

How to quantify how people are getting involved in improving their own
neighborhoods?

Who are the people in the geo area and/or in the schools that are NOT being
served by the Focus programs? (If these people live in the same
neighborhoods, this may or may not be a target group for future allocation of
resources)

What are the social, physical, cultural barriers to participation in Focus
programs for residents in these identified neighborhoods? Along this line, we
noticed on the map that there is a nice saturation to the areas just south of
the center, where people are accessing svcs, but there is less saturation to
the north of the center; tried to call Debbie - no answer; wondered why people
to the north are not accessing svcs- is it that they don't need them? that they
aren't aware? that they can't access for some reason? can't answer these
questions without further guidance from Focus/Debbie.

How do HVI participants in Focus:HOPE travel to the facility? Are there
barriers, such as traffic safety or limited "walkability" (e.g., limited north/south
crossings of the industrial area near Focus:HOPE; illegal dumping on
residential streets) that prevent people from making use Focus:HOPE or
other neighborhood attributes?
Recommendations for Future Data Collection

Future participant sheets should have Apartment number (etc) in separate
field

Possibly set up a Microsoft Access Database

Access could handle unique IDs for you and match people across
programs.

You can also set up a data entry interface (not ideal), which would allow
you to by-pass having to use a lot of SQL queries.
Take Advantage of Free Tools (Google Fusion Tables)
You can quickly create a map from a spreadsheet using Google Fusion Tables
(https://www.google.com/fusiontables/Home/). First, you have to save the
spreadsheet as a CSV (comma-separated value) file. You can do this through
the "save as" command in Excel. Then, go to Google Fusion Tables
(https://www.google.com/fusiontables/Home/) and click the create "Create"
button. Select "Table" from the list. Upload the spreadsheet.
An in-depth tutorial can be found
at: http://blog.apps.chicagotribune.com/2010/03/04/quickly-visualize-and-map-adata-set-using-google-fusion-tables/
Download