Program Review Support Tool Nathan Pellegrin Research Analyst Goals Background and purpose of the tool Demonstration Cal-PASS update OLAP Development at Cal-PASS OLAP Success Story: SSPIRE Cube Future development The Program Review Support Tool • • • • Funded by Hewlett. Currently being tested by several colleges. All data is from MIS. Caveat: figures may not match what is found in locally produced reports due to differences in master data sources and formulae used to derive figures. • Like a “data smorgasbord” and includes – student demographics – course grades – TOP code course hierarchy … the menu will be expanding ! Purpose • Not the product of a mandate or requirement from the Chancellors office. • Not intended to take the place of local tools. • Not intended to drive evaluation activities. • Intended for use by colleges as an optional FREE tool in their program review process. • Obtain feedback from users to scale and improve our data model and OLAP infrastructure. Cal-PASS Statistics Over 300,000,000 records Up to 15 years of data in some regions Over 7,000 schools, colleges and university members Over 150 research studies conducted in the last two years Sixty-six Professional Learning Councils (1,200+ faculty) Universities (23) CSU . •Channel Islands •Dominguez Hills •Fresno •Long Beach •Los Angeles •Monterey Bay •Pomona •Sacramento •San Bernardino •San Marcos •Stanislaus •San Francisco •San Jose •Sonoma UC . •Davis •Merced •Riverside •San Diego •Santa Barbara •Santa Cruz PRIVATE . •Otis College of Art and Design •National University •University of the Pacific Changing the Paradigm: OLAP Applications • OLAP = On-Line Analytical Processing • Like Excel pivot tables, except Excel handles only two dimensional data. • Stores pre-computed aggregations of data with B-Tree indexing for delivering fast retrieval times and fast calculation. • Enables users to perform analysis of data quickly with dragand-drop manipulation of variables and dynamic visualization. • Web-based for easy access – all processing is performed on the server so it does not tie up your work station (zero footprint). • Big time savings! • Ideal for the action research paradigm and design research. 3 Layers of the Application User Interface - Dundas MDX OLAP Cube - SSAS SQL Database(s) – SQL Server Development Process of the OLAP project is a technical collaboration between IT and Research … • • • • Server Architecture/O.R. – Alex Zakharenkov (IT) Submission Processes/User Interface – Nick Wade (IT) Data Model/ETL – Nathan Pellegrin (Research) Design/Feedback of OLAP cubes - All IT and Research Staff, including Terrence Willett and Mary Kay Patton Development times • Development of initial Dim Model started in July 2008 … incremental additions/changes congealed into a (basic) model by February, 2009. • Initial development of Program Review, including feedback and changes ≈ 8 weeks. • Dim model ETL execution ≈ 15 hours. • Processing of OLAP cube ≈ 20 min./300K rows . • Initial deployment of UI ≈ 3 weeks. Several changes since then. • UI required tweaks to OLAP cube design. Development Tools • .NET • SQL Server – storage – Integration Services – Analysis Services • BIDS • Dundas Cal-PASS Data Flow Dimensional Model •Analytical •Integrated K-12/CC/Univ •Time-dependent •2NF (Redundant CK) •Optimized Indexing ETL Object Repository 0607,01612590000001,0000179441,U,,12,ABARCA,CARL OS,09091988,M,500,,,,,,,,,,,,,,,,,,,01,15,1,,,,,,N,010,Y,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,275,0,0,4,24,0,0,0,0,0,0,,1 0607,01612590000001,000 0032006,X,,,,,,,,,,,,,,,,,,,,,,,N,Y,,,8,,,,,,,,,, 0179441,U,,12,ABARCA,CARLO 0607,01612590000001,0000154281,9107510861,,11,BLAC S,09091988,M,500,,,,,, K,BRITNI,11291990,F,600,,,,,,,,,,,,,,,,,,,00,13,1,,,,,,N,000,N,,, ,,,,,,,,,,,,,01,15,1, ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,302,1,8,6,35,3,15,5 ,,,,,N,010,Y,,,,,,,,, ,28,4,33,01,10032006,N,340,5,71,12,67,18,90,0,0,7,47,2.5,,,, ,,,,,,,,,,,,,,,,,,,,, ,,,,,,,N,N,U,72,80,,,,,,,,,, ,,,,,,,,,,,,,, 0607,01612590000001,0000159553,U,,11,BOWIE,EARLIS 9107510861,,11,BLACK,BRITNI,11291990,F,600,,,,,,,,,,,,,,,, HA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,, ,,,00,13,1,,,,,,N,000,N,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,1003200 ,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0, 6,N,302,1,8,6,35,3,15,5,28,4,33,01,10032006,N,340,5,71,12, 0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,, 67,18,90,0,0,7,47,2.5,,,,,,,,,,,N,N,U,72,80,,,,,,,,,, 0607,01612590000001,0000161233,U,, 0607,01612590000001,0000159553,U,,11,BOWIE,EARLIS HA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,, ,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0, 0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,, 0607,01612590000001,0000161233,U,, CUSTOM FILES ETL Cal-PASS Submission Universities UC UC UC •Semistructured data •Format/value Validation K-12 CSU CSU Private •Storage •Application Integration •Key-value pairs (KVP) design CCCCO MIS LA What does a dimensional data model do for Cal-PASS? UNIFY: Data from across segments is integrated into a unified dataset. STANDARDIZE: Table and field names, data types and value coding systems are standardized to be the same for all segments. SIMPLIFY: The number of tables and fields used to store the data is reduced. Granularity of tables are at the units of analysis. Table relationships reflect analytical relationships between entities. IMPROVE DATA QUALITY: Merging data brings data quality issues to light so they can be noted and/or resolved. Establishing primary and foreign key relationships enforces referential integrity. Multiple student identifiers are unified to produce a single “metakey” Missing course CBEDS classifications imputed using machine learning. REDUCE RISK: Without it, in order to produce one metric for all segments separate analytical data processing pathways are required for each segment, which means more maintenance and increased risk of inconsistent results. Using a dimensional model the analytical computations and services are centralized. BOOST PRODUCTIVITY: The simpler, cleaner data model makes it easier to develop cubes with re-usable components, generalized for all segments. Currently, analytical data processing must be developed separately for each segment. Using a dimensional model, only one pathway needs to be developed that applies to all segments. Organization Cal-PASS Unified Dimensional Data Model (Selected Tables) Student Status Student Course Outcome Course Taxonomy Course Award = Fact Table = Dimension Table Term = Foreign Key Relationship The Ideal: Centralization of analytical query processing Presentation & User Engagement OLAP Each statistic can emerge at multiple presentation points, but there is only one logical control point. Views and Stored Proc’s Dimensional Model Tables User-defined cohorts; model outputs Student identifiers from each source system are mapped to a new identifier through transitive closure of all connected values (using a modified version of the Floyd–Warshall algorithm). Local district student id Name + gender + DOB CSIS SSID CCCCO SID (SB00) n1 d1 d2 o1 n2 c1 d3 Each edge represents a record linking two values of different identifiers in submitted student records. m1 OLAP Success Story: SSPIRE Cube • • • • • Funded by Irvine Foundation. Currently used by nine colleges. Incorporates MIS data with data submitted by colleges (custom files). Tracks cohorts of students. Demonstrate using Merced college (thank you Dr. Duran!) Program Review Support Tool This is only the beginning… • Provide access to K-12 districts and Universities • Inter-segmental OLAP Cubes • Link non-academic outcomes (Employment Development Department, Child Welfare Data System) “Success at Every Level” Education Data and Information Act of 2008 SB 1298 1. convene a high-level working group to decide the best the governance structure for the comprehensive education data system; 2. directs the State Chief Information Office (CIO), in consultation with educators and education policymakers, to prepare a strategy plan outlining a clear path for technical implementation; and 3. requires the various education segments to begin using a common student identifier, so that once a governance structure and technical architecture are in place we can begin linking records from pre-k through the university with relative ease and speed. Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ High level cross-agency systems map of key collections Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ CDE Data Systems In development Existing WORKING DRAFT NOT EXHAUSTIVE Direct data sharing* Data sharing through local agencies Non CDE Data Systems National Student Clearinghouse AYP/API CASAS TOPS Pro CASEMIS CSU ERS CCC COMIS CALTIDES** Assessments CPEC CALPADS** Early Childcare Migrant Prisons, Census CALPASS CCTC CASE SACS Other CDE systems/ units including CDS, Charter schools Source: Planned/ potential UC CSS ConApps Other CDE units including Homeless, CALSAFE, Title 3, Private Schools etc. CDPH EDD From Franchise tax, benefits system etc. * Does not imply direct data linkages. Only state system linkages shown ** CALPADS is envisioned to replace much of the CBEDS, Language Census, Student National Origin Report and select Consolidated Application data *** CALTIDES is envisioned to collect data primarily from CALPADS and Commission on Teacher Credentialing’s CCTC’s Credential Automation System Enterprise CASE system Interviews with respective agencies, RAND, team analysis High level system profiles of key CDE collections (1/2) DRAFT Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ System name Description Key identifier Data categories Granularity Data sharing CALPADS California Longitudinal Pupil Achievement Data System. System (under development) for tracking K12 students longitudinally, that will replace CBEDS collections SSID Student demographic, program participation, grade level, enrollment, course enrollment and completion, discipline, and statewide assessment Student Planned includeAssessments, API/AYP, Migrant, ConApps, CALTIDES California Longitudinal Teacher Integrated Data Education System. Iintegrated data system for teacher data based on unique SEID SEID Teacher credentials, authorizations, teacher participation program, alternative routes, participation in Beginning Teacher Support and intern program, SEID, Salary Student Planned includeCALPADS, CCTC CASE CASEMIS California Special Education Management Information System. Integrated data system for special education students on students, services and provider programs SSID Attendance/Enrollment, Disciplinary, Education Agency, Mobility, Special Education, Staffing Data, Student Demographic, Other (services, age, gender, race/ethnicity) Student, School None at state district, School, level county, region Assessments California High School Exit Exam SSID CAHSEE, Standardized Testing and Reporting STAR and CELDT Attendance/Enrollment, Education Agency, Food and Nutrition, Parent Data, Special Education, Student Demographic Student, School CASAS, District, Migrant, School, County AYP/API, CALPADS (planned) API/ AYP Accountability related information CDS code based on California's Public Schools Accountability Act of 1999 as well as No Child Left Behind Act of 2001 AYP/API score by student characteristics School Source: Respective CDE departments Assessments High level system profiles of key CDE collections (2/2) DRAFT Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ System name Description Key identifier Data categories Granularity Data sharing Migrant Student enrollments in migrant education programs. Includes migrant education forms and a directory of offices providing services Migrant ID, COE number, CDS code Student demographics, educational programs, counseling, health and support services, emergency health, clothing, food, transportation Student Assessments, CALPADS (planned) SACS Standardized Account Code Structure. Offers LEAs with a means of reporting financial information CDS code For every general ledger accounting transaction- information on funds, resources, project year, goal, function, and object. Includes information on Attendance/Enrollment, Education Agency, Fiscal, Transportation School, District CDS, Charter schools ConAPPS Consolidated applications. Includes information on categorical programs e.g., Title I, II, V etc. CDS code Student demographic, Title I, III, V, Part A, Immigrant, LEP, funding model, charter status, Gradespan, participants School, District, County CALPADS (planned) Early Childcare Systems CD-801A,B, CDMIS, Special Education SSID Desired Result System SEDRS, and CD 9600 Child demographics, IEP flag, family identification/case number, household name, type of program, DRDP Desired Result Development Profile, Early Childhood Environment Rating Scale ECERS Student None CASAS TOPSPro Comprehensive Adult Student Assessment Systems. System for tracking Students in Adult Education Programs Student demographics, Agency, instruction Student level and program, assessment scores, date of entry, reason for exit, class number, attainable goal within program year Source: Respective CDE departments ADA ID, SSID, CASAS no Assessments 2 High level system profiles of non-CDE collections DRAFT Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ System name Description Key identifier Data categories Granularity Data sharing CPEC California Post Secondary Education Commission. Data system for Higher Ed- post secondary systems Student ID based of SSN Demographic, IEP, grade level, program, Graduation rate, teacher, institution Student CDE, CSU, UC,CCC, prison, census UC CSS Corporate Student System provides SSN information on student enrollment and performance for University of California campuses Student demographic, income, financial aid, Student education history, assessment CDE, CCC, CALPASS CCC COMIS California Community Colleges SSN, Student Management Information System. ID COMIS data is used to prepare reports for Federal and State reports including Integrated Postsecondary Education Data System (IPEDS) and to track student outcomes student demographic, income, financial aid, education history, assessment, teacher, institution Student CALPASS, CPEC, CSU, EDD, National Student Clearinghouse CSU ERS Enrollment Recording System is used by Cal State to track student retention and graduation to support regular term reports, IPEDS, and state budget requests SSN Student demographic, financial aid, education history, assessment Student CPEC, CALPASS, CCC CDPH California Department of Public Health. System use to track CDPH ID Case ID and demographics, clinical and diagnostic data Case None EDD Employment Development Database ID based of SSN Wages, payroll taxes, unemployment tracking, job matching, job training Employee Franchise tax, benefits system, CCC Source: Respective agencies, RAND 2 Thank you! Have fun …. HAPPY DATA! Data Data