Managing survey data A data warehouse approach Russ D’Aiello Kristen Eaton

advertisement
Managing survey data
A data warehouse approach
Russ D’Aiello
Database and Reporting Analyst
Kristen Eaton
Assistant Director of Institutional Research, Assessment, and Effectiveness
Mark Freeman
Vice Provost for Institutional Research, Assessment, and Effectiveness
Reasons for: • Data coordination across multiple staff
• Many end users for the data (raw and reports)
• End users want different views of data (web)
• Link survey and institutional data in existing DW
• Link data from different surveys
Reasons against:
• Loading source data into data warehouse
• Extracting data from data warehouse
• Managing metadata
Restructuring data for survey.FactSurvey
Qualtrics‐
generated SPSS file
Changed field names to QuestionKey
values in survey.dimQuestion
Added some fields
PIDM
SurveyKeyTermKey 12130391 12130392 12130402 12130403 12130404 12130405 12130406 12130407 12130408
20210
12 201335 Berlin
New Jerse
4
4
4
4
4
5
5
SQL Server survey.FactSurvey table
RESHAPE
Manually moved text values intro a new column
http://kb.tableausoftware.com/articles/knowledge
base/addin‐reshaping‐data‐excel#2010
survey.dimQuestion
These values become survey.dimQuestion.Desc
(full question wording)
These values will be held in survey.dimQualtricsItem
and replaced with a database specific QuestionKey field
These values cannot be pasted directly into dim.ResponseLabel, where they belong
survey.dimResponseLabel
In SPSS, select:
• File >> • Data file information >>
• Working File
Getting data back out of the warehouse
Use a dynamic pivot query to rotate rows into columns, dynamically selecting field names from dim.Question (or dimQualtricsItem, as needed)
Problem: Attach metadata to the flat file
Survey data flat file: one row per ID
Full variable name? Variable type? Value labels? Missing values?
SPSS files for schools/colleges
Value labels
Rebuild
Python 2.7 PYODBC SavReaderWriter
Python sample
Write SPSS file
Case Study: Spring 2014 Senior Exit Survey
• Multiple end users of data in IRAE
•
•
•
•
Institutional Summary Report
College Specific Reports
Complete Analysis
Postgraduate Outcomes
• Easy method of distributing data files to schools and colleges
• This year brute force – Excel, slicing data files
• Next year: dynamic reporting tool (Tableau, QlikView for reports and raw data access (engaged with IRT)
Benefit #1: Alignment of Source Data
• New cleaning is centrally managed
• Always have a pristine version of the data
• Simultaneous development of data management and report production
Benefit #2: Alignment and coordination of complex definitions
Benefit #3: Alignment of Same Surveys Across Time
• Trend data becomes possible
• However, still some brute force work that must be done
Benefit #4: Merge with existing data resources
Institutional Data
• Debt
• Online vs hybrid
• Degrees awarded
Other Survey Data
• Merge senior exit survey with 9 months out for most comprehensive view of postgrad outcomes
• Merge CIRP Freshman Survey with Enrolled & Senior surveys
• Enrolled survey at while student is freshman, match with senior survey three years later
Links
• Presentation www.drexel.edu/provost/irae/resources/presentations/
• Python 2.7 https://www.python.org/
• Python module PYODBC https://code.google.com/p/pyodbc/
• Python module SavReaderWriter
http://pythonhosted.org/savReaderWriter/
Download