Archival/Research Database Survey

advertisement
Community Shelter Board
Research/Archival Database Project
Final Report
FINAL REPORT
Archival/Research Database for CSB
Overview
The Community Shelter Board of Columbus, Ohio established a management information system
for homeless service tracking in 1989. Since that time Columbus’ homeless management
information system (HMIS) has evolved to include client-level data about recipients of prevention,
emergency shelter, transitional housing, permanent supportive housing, and transitional financial
assistance.
The HMIS coverage includes nearly all homeless assistance providers and programs in the
Columbus metro area. The HMIS has consistently captured a core set of data elements for each
client served which include demographics, services provided/received, and outcome. Additional
client information and project-specific data elements have been collected and added to the HMIS
at various times throughout the past fifteen years to facilitate ongoing program evaluation,
program monitoring and research initiatives.
In 2001 CSB upgraded their HMIS system from a locally managed, DOS-based system with dial
up access, to a web-based system in ServicePoint software managed off site by Bowman Internet
Systems. Data from 1989 through 2001 in the DOS-based system was not migrated to the
currently used ServicePoint product. Consequently CSB now has two different systems of HMIS
data, although the data sets contained in each are based on the same data elements and client
response categories.
Because Columbus is one of the only communities throughout the United States with high quality,
historical homeless system data, tremendous opportunities exist for sophisticated data analysis
and research. However, the two existing datasets are not currently managed in a single, uniform
and consistent system.
This report evaluates commonly accepted organization standards for research databases and
makes recommendations for database technical specifications and management protocols.
Key Informants Interviewed
Steve Poulin
University of Pennsylvania, Center for Mental Health Policy & Services Research
Marti Burt
Urban Institute
Brian Sokol
University of Massachusetts – Boston, McCormack Center for Social Policy Research
Nancy Smith, Derek Coursen
Vera Institute of Justice
Michael Banish, Marcus Mattson
Community Research Partners
Eric Grumdahl
Hearth Connection
Ellen Bassuk, Nick Huntington
National Center on Family Homelessness
Shared by Columbus OH on HMIS.Info/Database Research
1
Community Shelter Board
Research/Archival Database Project
Final Report
Ed Malecki, Hazel Morrow-Jones
The Ohio State University, Center for Urban and Regional Analysis
Matt Kaliner, Copeland Young
Harvard University, Radcliffe Institute for Advanced Study – Murray Research Center
Database Structure and Format
Data format  relational database
The preferred format for an archival database is one in which the integrity of the original data
structure is maintained. Since the original data sources are both organized in relational
tables, the archival database should also maintain the data relationships and structure in a
relational database.
Key informants also recommended that CSB maintain a detailed and up-to-date codebook
that includes an “entity relationship diagram”, providing clear descriptions of the relationship
among data elements within and among tables.
Back end environment  SQL
The recommended software for storing data archived from a relational database is a SQL1
SQL is universally regarded as the recommended approach due to the compatibility, reliability
and flexibility SQL provides.
 Compatible – multiple SQL products are universally supported and understood by the
largest proportion of database administrators. SQL also allows the archived data to be
stored in a relational table format that maintains the integrity of the original data
source.
 Reliable – provides fast, secure, and robust tool for integration of new data during
regular updates to archival database from the ServicePoint interactive live site.
 Flexible – export script can be easily written to transfer data in various formats (ASCII,
tab delimited, flat file, relational tables) and subsets (subpopulations, program types,
profiles, etc.). SQL also provides ability to integrate data into multiple formats and
software products depending on research needs and staff preference. SQL is
transferable to text files in ASCII format, Microsoft Access, Microsoft Excel, Crystal
Reports, File Maker, Dbase, SPSS, SAS, and many others.
Front end software  SQL, Access, SPSS
The recommended software for managing archival data is less definitive. Respondents were
split in their answers to this question depending on their particular preference and expertise.
Database administrators preferred SQL. Academic researchers preferred SPSS. Data
analysts and general policy analysts preferred Access.
SQL may seem to be the best option if the back end is built in SQL, but only if CSB can
support the database with staff appropriately trained and experienced in SQL database
1
Structured Query Language - SQL is a standard interactive and programming language for sending
queries to, getting information from, and updating a relational database. Although SQL is a standard used
in database administration, many database products support SQL with proprietary extensions to the
standard language. Examples of database products that support SQL include dBase, Oracle, and SQL
Server.
SQL is the industry standard for storage, management, and administration of relational databases. As the
industry standard SQL provides the best performance, scalability, and reliability for management of
relational databases.
Shared by Columbus OH on HMIS.Info/Database Research
2
Community Shelter Board
Research/Archival Database Project
Final Report
programming and administration. SQL certified staff are highly skilled and tend to demand
the highest salaries of all database administrators.
Other options for the front end interface of data management include basic database
software packages such as Access. Access is highly flexible and an easily transferable
product. Although data analysis is possible with Access, other products provide more
sophisticated and robust reporting and analysis functionality.
SPSS is the optimal choice for advanced data analysis. SPSS is constructed in a flat
environment (client identifiers and demographics are repeated for each individual record or
client contact). A flat environment can lead to cumbersome and unwieldy databases for very
large data sets. SPSS is not as common as SQL or Access, and it can be more difficult to
find staff with appropriate training and expertise in SPSS.
If the back end architecture is built in SQL, the choice of front end software packages is less
critical. SQL is the industry standard for storage, management, and administration of
relational databases, affording enough flexibility, reliability, and scalability that the choice of
the front end system becomes one of personal choice and CSB staff preference.
Additionally, many of the standard queries and reports that CSB will use on a regular basis
can be preprogrammed in SQL, requiring input of basic date ranges to extract those data that
CSB require.
Contents of Research Database
The required data elements for the research database should comply with recently developed
US Department of Housing and Urban Development (HUD) Data and Technical Standards for
HMIS. These standards identify the specific data elements, data collection and reporting
protocols, sharing of protected personal information (PPI), covered homeless organizations
(CHO) in the HMIS, privacy and security standards, allowable uses of HMIS data, application
and system security, and electronic transmission requirements. The archival dataset of
CSB’s HMIS is a covered entity under the new Data and Technical Standards for HMIS. All
applicable standards and requirements for HMIS are equally relevant to the archival
database.
The collection of specific data elements may vary according to program type. For instance
programs that serve families in a transitional housing setting may collect sets of data that are
different from programs that serve single adults in an emergency shelter setting. The
contents of the Research Database will be defined according to program type, rather than a
single data set for every client encounter. In general, however, a standard set of universal
data elements will include basic demographics and service utilization.
Data Elements  client identifiers, demographics, related household data, disability
status, program data (including entry and exit dates), residential
history, client outcomes, and exit destination.
Key informant responses generally recommended that individual case planning data beyond
the HUD-required universal data elements and program-specific data elements should not be
included in the archival database. Client-level case management information not included in
the database would include case notes, status reports, treatment plan monitoring, and any
clinical information relating to the planning, monitoring, and documentation of service
provision.
Shared by Columbus OH on HMIS.Info/Database Research
3
Community Shelter Board
Research/Archival Database Project
Final Report
Database Management
Frequency of data dumps from live site to archival site  semi-annual
CSB intends to use the live, interactive ServicePoint database to manage many of their
regular reporting functions which include funder-required summary reports, program-specific
summary activity reports in aggregate by various time periods, population-specific summary
activity in aggregate by various time periods, monthly trend analysis by program and subpopulation, and additional ad hoc demographic, utilization, and accounting reports. CSB is
heavily dependent on a constant stream of quality data to monitor program utilization and
client outcomes.
Reports from the archival/research database will only be extracted for purposes of research,
large-scale evaluation, rigorous data analysis, and monitoring of long-term trends. These
types of reports are run more efficiently from a research database specifically designed for
these purposes.
After the initial communication protocols and importing scripts are developed that allow data
to be archived from the ServicePoint site to a CSB-managed research database, the
frequency of subsequent data dumps is contingent on CSB’s ability to conduct
comprehensive data quality testing and cleansing. The labor intensive nature of data
monitoring, cleansing, and auditing processes is such that these data quality processes can
only be conducted on a quarterly basis. Therefore, it is recommended that CSB conduct
semi-annual data integration (‘dumps’) from the live ServicePoint site to the archival research
database, allowing adequate time for data quality issues to be addressed. Data dumps more
frequent than semi-annually do not allow for the rigorous and comprehensive quality
assurance testing and data cleansing that CSB currently conducts.
Frequency of data purging from the live site  annually
The live ServicePoint site will benefit from the deletion of records that are determine to be
“inactive”. Homeless Assistance providers will find ServicePoint easier to navigate and faster
to use with the purging of client records that are no longer accessed by homeless assistance
providers in the provision of case planning and client management. The definition of
“inactive” will need some input and discussion from all interested parties to strike a balance
between usability and completeness. Ultimately, the “inactive” designation should be
determined based on a rigorous review of current client utilization patterns. Any data purged
from the live site should include a comprehensive audit trail that indicates what specifically
was purged, who did it, and the criteria used to make the purge determination.
Backups and Storage
Key informants recommend that CSB follow standard practices for backup and storage of
data off site as an extra but necessary precautionary measure. Additionally, any SQL code
that describes database structure, relationship codes, programming, reports, etc. should also
be saved on backup disks and stored off site. The processes employed for the backup and
storage of the archival/research database can be incorporated into existing CSB server and
database backup procedures.
Projected staffing and management activities:
Phase 1 (start-up, 4 to 6 months)
 206 consulting hours
Phase 2 (annual, ongoing)
 0.5 FTE
The specific tasks associated with data modeling, designing the research database, writing
the specifications for data transfers, conducting the initial integration of all data into the new
research database, and validity testing to assure accuracy all require expertise in database
development. These activities are distinct, one-time functions and are characterized as
Phase 1.
Shared by Columbus OH on HMIS.Info/Database Research
4
Community Shelter Board
Research/Archival Database Project
Final Report
The skill sets required to maintain the database during Phase 2 (following initial start-up) are
quite different and are more aligned with database administration. These activities include
the following:
database management – security maintenance, backups, managing tape, cartridge, and
assorted media
data processing – report generation, copying, editing, and logging of pre-programmed
reports
quality management – maintain quality controls for ensuring accuracy and integrity of
data files
data analysis – develop preliminary analyses for initial data exploration involving basic
statistical analysis (i.e. frequency distributions, correlations, chi square, ttests, etc.)
Research Request Process for Data Access
Throughout the history of CSB many researchers and universities have approached CSB
staff with requests to access the MIS for analysis, research studies, evaluation work, data
matches with mainstream administrative datasets, and general data mining. CSB has
reviewed these various requests on an ad hoc basis and made determinations to pursue
partnerships based on the merits of the research design and resources necessary to
participate. Future requests for collaborations and participation in research will increase as
CSB’s data sets are organized and designed for this purpose.
Managing research requests must be approached intentionally to ensure cost-effective,
timely, secure, and appropriate partnerships. CSB must articulate clear research objectives
and make determinations about potential research partnerships based on the quality of the
research design, ability of researcher to complete the research in a timely and cost effective
manner, and consistency with local research objectives. The following recommendations
provide a basic structure for managing research requests and facilitating productive
partnerships.
All requestors of data from CSB’s archival/research data base will be required to complete a
pre-proposal concept paper. CSB staff will review pre-proposal concept papers to make
one of the following three (3) determinations:
1. The requestor’s pre-proposal has no merit and no access to the archival/research
database will be allowed.
2. The requestor’s pre-proposal has merit and only de-identified client data (aggregate
summary data) is required for the research project. CSB may provide de-identified
aggregate reports to the researcher.
3. The requestor’s pre-proposal has merit and identifiable client data is necessary for
the research project. The researcher is invited to complete a full proposal. Access to
data will be determined after successful submission of a full proposal and approval of
the Ad Hoc Committee for Review of Research Requests (described below).
The research request pre-proposal (requests for access to CSB’s administrative database
of client-level records) must be initiated by a written proposal that describes the following:
 Name, affiliation, credentials of principal researcher
 Research design that describes scope, scale, objectives, and priorities for research
 Description of data (aggregate or client-level) required for research (population or
sub-population type; program type; date ranges; specific flags, filters, cross-tabs,
etc.)
Shared by Columbus OH on HMIS.Info/Database Research
5
Community Shelter Board
Research/Archival Database Project
Final Report
The research request full proposal must describe the following information in addition to
pre-proposal information:
 Name, affiliation, and credentials of any research assistants with access to data
 Funding source underwriting research project and/or researcher’s time
 Estimated timeframe for completion of research or data analysis
 Estimated amount of CSB staff time associated with providing explanations,
context, assistance, TA with data extraction and/or analysis.
 Description of researcher’s plan to assure privacy and confidentiality of client data,
compliance with HIPAA guidelines, and plan for management, storage, and
eventual destruction of client data.
Because researchers are using CSB HMIS data as a secondary data resource, no IRB
process is required. However, rigorous controls on allowing data access need to be
established. Following the successful and complete submission of a written proposal, CSB
will manage the approval process by calling a meeting of an Ad Hoc Committee for Review
of Research Requests. This Ad Hoc Committee will be charged with meeting whenever
necessary to review research request proposals and making determinations about providing
access to data based on the merits of the proposal. Membership of the Ad Hoc Committee
may be comprised of the following:
 CSB staff (Executive Director, Program Director, Database Manager)
 Consumer of homeless assistance program
 Homeless assistance provider
 Funder of homeless assistance and/or housing
 Member of the Rebuilding Lives Funder Collaborative
 Individual(s) with academic credentials representing research interests (OSU)
Any researcher granted access to CSB’s HMIS data will be required to sign a Memorandum
of Understanding (MOU) that describes the terms of the partnership, any liabilities, a plan
for dissemination of research findings, and any other conditions placed on data access.
Approved data format ASCII file with requested data in tab delimited format
Each researcher has slightly different views on the optimal structure, data format, and
software products that are best suited for research and analysis. Opinions differ based on
the degree of training and familiarity with different statistical analysis products, the size and
scope of data, and the level of analysis. Key informants generally recommended that CSB
provide data to researchers in the most basic, simplest format possible and then require each
independent researcher to convert the data to the format of his or her choosing. Following
approval of a research request, CSB will create a copy of the requested data set on a CD in
an ASCII file in tab delimited format. CSB will also consider providing data in different
formats depending on the timing and resource requirements at CSB. CSB will also provide a
detailed code book that defines each data element within a field, how cases are organized,
and the logarithm for establishing the unique client identifier. Each case within a data set
should represent a separate shelter visit.
Security and Confidentiality
The privacy protections of identifiable data within a data set are major concerns when
making client-level data available for research purposes. Although current client consent and
release protocols in place at homeless assistance programs in Columbus are compliant with
industry standards and provide for adequate provisions for research and general
management operations, researchers must comply with the following additional security and
confidentiality measures:
 Researchers and their affiliated institutions must comply with all applicable privacy
and security mandates such as HIPAA
 Researchers must sign a confidentiality agreement
Shared by Columbus OH on HMIS.Info/Database Research
6
Community Shelter Board
Research/Archival Database Project
Final Report



Anonymity of client-level data within any released data set must be protected and
maintained throughout the research and analysis process, and in any final reports,
studies, or published accounts of findings.
All data sets (original and derivative) must be destroyed upon completion of the
research
Only researchers specifically approved by the Ad Hoc Committee will have access
to data
Preferred method of data sharing  de-identified client level data
Researchers must be able to match CSB’s client code (unique identifier) with the
researcher’s client code. As a general practice all client identifiable data will be masked or
stripped from released data sets. In extenuating circumstances when valid data matches
(linking) are not possible without the presence of individual identifiers, researchers may have
limited access to client-level data with identifiers.
Data Quality
As a general rule survey respondents recommended that all data, even somewhat
problematic data, should be included in the research/archival database. Tolerance to
various levels of data quality will differ based on the specific research question, scope, and
analysis methodologies. The data quality standards of data within the live ServicePoint
site should also be carried over to the archival database.
Creation of Research Database
Phase 1 of the project is anticipated to take a total of 4 to 6 months and require the following
activities:
Providence – define the origin of each data element within the database (DOS-based
FirstLink system vs. ServicePoint system); document the location,
relationship, and any issues or problems impacting future data
interpretation or analysis. Much of the documentation currently exists for
this activity. Data providence is expected to take 2 to 4 weeks.
Data modeling – define data variables, relationships, and logic of database organization;
conceptualize database structure; construct an entity relationship
diagram for the new research/archival database. Data modeling is
expected to take 4 to 6 weeks.
Database development – write the SQL program schema for data migration to archival
database from the live site and from the DOS-based system. Both data
sources should be integrated into the new research/archival database.
Testing of the migration process should be conducted and the process
verified before final integration is accepted. Database development is
expected to take 4 to 6 weeks.
Analysis of problematic data – Client data from the period 10-1-01 through 6-30-02 is of
questionable quality, completeness, and consistency. This period
represents the initial transition time from use of the DOS-based system
for client management to the ServicePoint site. Data from this time
period will need special attention to determine their “fitness” for
conversion into the research/archival database. This analysis is
expected to take 4 to 6 weeks.
Data migration – run the SQL procedures that join, filter, and convert data. This process
is expected to take less than 1 week.
Build import/export reports – write SQL protocols for periodic updating of
research/archival database from live, ServicePoint site. Write export
module to extract data in uniform data formats for regular research,
Shared by Columbus OH on HMIS.Info/Database Research
7
Community Shelter Board
Research/Archival Database Project
Final Report
evaluation, and reporting functionality. Report building is expected to
take 4 to 6 weeks.
Testing – ensure that data elements are not represented by more than one field (look for
large number of missing values when frequencies run for duplicate
fields); run basic queries from the new research/archival database and
match with approved ServicePoint queries to assure accuracy and
consistency. Testing is expected to take 2 to 4 weeks.
Due to the technical nature of the initial database development and the intensive staff time
required to conduct the design, integration and testing, survey respondents recommended that
CSB consider contracting with a database development consultant skilled in these effort. Local
examples of database development consulting teams include the following:
 Avenscia Inc.
 CompuWare
 Microman
 Resultdata
 Sarcom
Costs
Phase 1 – Database design, development, and testing:
Costs associated with hiring consulting services for this time-limited activity are expected to be in
the range of $15,000 to $25,000. This estimate is based on key informant interviews with
database developers and/or consultants engaged in projects of similar scale and scope. Costs
include database server hardware, software, peripherals such as tape back up, cables, and
firewalls, and database set up and building of the interface.
Phase 2 – On-going maintenance and support:
Costs associated with hiring staff to support the administration of the new research/archival
database and to perform related technology maintenance functions at the Community Shelter
Board on a 0.5 FTE basis are expected to be in the range of $25,000 to $35,000. The total
Phase 2 figure also includes a small annual contract for consultant services associated with
maintenance and support of the database.
Resource Comparison Matrix
The following chart highlights the resource requirements based on high end estimates for CSB
staff time, consulting services, hardware purchase, and annual support contracts. The total, firstyear cost of the project is anticipated to be in the range of $72,500 to $77,500.
Resource Requirements – Time
Phase 1 (4 to 6 months):
Providence
Data Modeling
Database Development
Analysis of problematic data
Data migration
Build import/export reports
Testing
Total resources for Phase 1
Phase 2 (ongoing):
Total resources for Phase 2
Resource Requirements - Dollars
Shared by Columbus OH on HMIS.Info/Database Research
8
CSB Staff
Resources
(hours)
Consulting
Resources
(hours)
20
20
16
20
20
16
112
10
40
60
20
16
40
20
206
0.5 FTE
80
CSB Staff
Resources
(hard costs)
All Other
Resources
(hard costs)
Community Shelter Board
Research/Archival Database Project
Final Report
Phase 1 (4 to 6 months)
Consulting services for database development
Hardware, software, peripherals, set up
Phase 2 (ongoing)
Database manager (0.5 FTE)
Database maintenance & support (annual)
$25,0002
$7,5003
$35,0004
$10,0005
Total First-Year Cost
$35,000
$42,500
Future partnerships and research activities
What types of research questions would drive potential partnerships?
 Spatial analysis – mapping of client location, service locations, employment centers,
transportation networks, etc.
 Impact of welfare reform – analysis of clients who have experience homelessness and
the impact of limitations of public assistance benefits
 Trends in homeless population counts and profiles
 counts and profiles of special populations, e.g. families, children, veterans, mentally ill,
substance abusers, chronically homeless
 Patterns of shelter use
 Integrated database research (intersection of homelessness and government systems
such as justice, Medicaid, mental health, etc.)
 Costs of service use
2
Consulting services are estimated at a rate of $120 per hour.
Total hardware costs include purchase of database server, required software, and peripherals.
4
Based on key stakeholder interviews a database manager demands an annual salary in the range of
$50,000 to $70,000. The position described within this report requires the resource of one half-time person
or 0.5 FTE at an estimated annual 1.0 FTE salary of $70,000.
5
Database annual support contract is estimated at 80 hours per year at a rate of $120 per hour.
3
Shared by Columbus OH on HMIS.Info/Database Research
9
Community Shelter Board
Research/Archival Database Project
Final Report
Reference Resources
(Included in hard copy format with final report)
Database Structure Descriptions:
Community Research Partners
Hearth Connection
Vera Institute of Justice
Standard Request for Database Access:
University of Massachusetts – Boston, McCormack Center for Social Policy Research
Harvard University, Radcliffe Institute for Advanced Study – Murray Research Center
The Ohio State University, Center for Urban and Regional Analysis
Security Standards & Consent Forms:
The Ohio State University, Center for Urban and Regional Analysis
University of Massachusetts – Boston, McCormack Center for Social Policy Research
University of Pennsylvania, Center for Mental Health Policy & Services Research
Harvard University, Radcliffe Institute for Advanced Study – Murray Research Center
Data Integration Standards:
University of Massachusetts – Boston, McCormack Center for Social Policy Research
Entity Relationship Diagram:
Community Research Partners
Hearth Connection
Shared by Columbus OH on HMIS.Info/Database Research
10
Download