Managing survey data A data warehouse approach Russ D’Aiello Database and Reporting Analyst Kristen Eaton Assistant Director of Institutional Research, Assessment, and Effectiveness Mark Freeman Vice Provost for Institutional Research, Assessment, and Effectiveness Reasons for: • Data coordination across multiple staff • Many end users for the data (raw and reports) • End users want different views of data (web) • Link survey and institutional data in existing DW • Link data from different surveys Reasons against: • Loading source data into data warehouse • Extracting data from data warehouse • Managing metadata Restructuring data for survey.FactSurvey Qualtrics‐ generated SPSS file Changed field names to QuestionKey values in survey.dimQuestion Added some fields PIDM SurveyKeyTermKey 12130391 12130392 12130402 12130403 12130404 12130405 12130406 12130407 12130408 20210 12 201335 Berlin New Jerse 4 4 4 4 4 5 5 SQL Server survey.FactSurvey table RESHAPE Manually moved text values intro a new column http://kb.tableausoftware.com/articles/knowledge base/addin‐reshaping‐data‐excel#2010 survey.dimQuestion These values become survey.dimQuestion.Desc (full question wording) These values will be held in survey.dimQualtricsItem and replaced with a database specific QuestionKey field These values cannot be pasted directly into dim.ResponseLabel, where they belong survey.dimResponseLabel In SPSS, select: • File >> • Data file information >> • Working File Getting data back out of the warehouse Use a dynamic pivot query to rotate rows into columns, dynamically selecting field names from dim.Question (or dimQualtricsItem, as needed) Problem: Attach metadata to the flat file Survey data flat file: one row per ID Full variable name? Variable type? Value labels? Missing values? SPSS files for schools/colleges Value labels Rebuild Python 2.7 PYODBC SavReaderWriter Python sample Write SPSS file Case Study: Spring 2014 Senior Exit Survey • Multiple end users of data in IRAE • • • • Institutional Summary Report College Specific Reports Complete Analysis Postgraduate Outcomes • Easy method of distributing data files to schools and colleges • This year brute force – Excel, slicing data files • Next year: dynamic reporting tool (Tableau, QlikView for reports and raw data access (engaged with IRT) Benefit #1: Alignment of Source Data • New cleaning is centrally managed • Always have a pristine version of the data • Simultaneous development of data management and report production Benefit #2: Alignment and coordination of complex definitions Benefit #3: Alignment of Same Surveys Across Time • Trend data becomes possible • However, still some brute force work that must be done Benefit #4: Merge with existing data resources Institutional Data • Debt • Online vs hybrid • Degrees awarded Other Survey Data • Merge senior exit survey with 9 months out for most comprehensive view of postgrad outcomes • Merge CIRP Freshman Survey with Enrolled & Senior surveys • Enrolled survey at while student is freshman, match with senior survey three years later Links • Presentation www.drexel.edu/provost/irae/resources/presentations/ • Python 2.7 https://www.python.org/ • Python module PYODBC https://code.google.com/p/pyodbc/ • Python module SavReaderWriter http://pythonhosted.org/savReaderWriter/