Developing metrics and predictive algorithms for your institution– Marist Story JISC LEARNING ANALYTICS NETWORK EVENT SANDEEP JAYAPRAKASH Sandeep Jayaprakash LEAD DATA SCIENTIST MARIST COLLEGE & APEREO Twitter - @sandeep_jay1 Presentation Overview Open Academic Analytics Initiative (OAAI) ◦ Objectives ◦ Data extraction & preparation ◦ Predictive Models and Results ◦ Impact on Student Success ◦ Deliver insights to end users Predictive Analytics in Action: Open Academic Learning Analytics Initiative PRACTICAL EXAMPLE: EARLY ALERT SYSTEM Open Academic Analytics Initiative EDUCAUSE Next Generation Learning Challenges (NGLC) in United States Funded by Bill and Melinda Gates Foundations $250,000 over a 15 month period Goal: Leverage Big Data and analytics to create an open-source academic early alert system and research “scaling factors” Input Data Considerations A predictive model is usually as good as its training data Good: ◦ Volume - Have lots of data (multiple semesters) ◦ Variety - Diverse data ◦ Veracity – Data should support the value system Not so good: ◦ Data Quality Issues ◦ Unbalanced classes (at Marist, 6% of students at risk. Good for the student body, bad for training predictive models ) Learning Analytics preparedness Learning Analytics is inter-disciplinary and coordination among different groups High level buy in from management for a smooth execution Data comes from a wide range of systems Ethics and policy should go hand in hand Open Academic Analytics Initiative (OAAI) Jisc Data Specification Github link Predictive Modeling process Feature Extraction - Data Quality Issues Variability in instructor’s assessment criteria Variability in workload criteria across modules Variability in period used for prediction (early detection) Variability in grading criteria across modules (partial grades with variable contribution) Data Quality Issues Variability in VLE tools deployment by instructors Variability in usage of tools by students Modules Result - Missing Values and holes in your data Tools How Do we address them ? Handling Variability - Use ratios and class averages ◦ Activity - Percent of usage over Avg percent of usage per course ◦ Grades - Effective Weighted Score / Avg Effective Weighted Score Handling missing values ◦ Follow an 80 / 20 rule for selection of metrics ◦ Perform data imputations to further enrich the quality ◦ Build cohort based models to leverage more predictors Sampling - Balance the datasets Predictors of Student Risk VLE predictors were measured relative to course averages. Some predictors were discarded if not enough data was available. Machine Learning Classifiers C4.5/C5.0 Boosted Decision Tree Logistic Regression Support Vector machines Predictive Performance of Marist Model Research Design Models were developed based on Marist data ◦ 85 % accuracy in capturing at-risk students Deployed OAAI system to 2200 students across four institutions ◦ Two Community Colleges (FE institutions) ◦ Two Historically Black Colleges and Universities (BaME institutions) Design > One instructor teaching 3 sections of the same module ◦ One section was control, other 2 were treatment groups Each instructor received an AAR three times during the semester ◦ Intervals were 25%, 50% and 75% into the semester Institutional Profiles Predictive Model Portability Findings Conclusion 1. Predictive models are more “portable” than anticipated. 2. It is possible to create generic models and the “process can be ported” for use at specific types of institutions. 3. It opens a possibility to explore library of open predictive models and techniques that could be shared across institutions to Learn @ Scale Intervention Research Findings Final Course Grades Analysis showed a statistically significant positive impact on final course grades ◦ No difference between treatment groups Saw larger impact in spring than fall Similar trend among low income students Intervention Research Findings Content Mastery Student in intervention groups were statistically more likely to “master the content” than those in controls. ◦ Content Mastery = Grade of C or better Similar for low income students. Intervention Research Findings Withdrawals Students in intervention groups withdrew more frequently than controls Possibly due to students avoiding withdrawal penalties. Consistent with findings from Purdue University Instructor Feedback "Not only did this project directly assist my students by guiding students to resources to help them succeed, but as an instructor, it changed my pedagogy; I became more vigilant about reaching out to individual students and providing them with outlets to master necessary skills. P.S. I have to say that this semester, I received the highest volume of unsolicited positive feedback from students, who reported that they felt I provided them exceptional individual attention! More Research Findings… JAYAPRAKASH, S. M., MOODY, E. W., L AURÍA, E. J., REGAN, J. R., & BARON, J. D. ( 2014). EA R LY A L E RT OF ACA DE M ICA LLY AT - R ISK ST UDENTS: A N OP E N S OU RCE A N A LYTI CS I N I TIATIVE . JOURNAL OF L EARNING ANALYTICS, 1( 1), 6 - 47. Dashboards – Deliver insights Learning Activity Radar Chart Short Demo video link Radar chart - Low Risk pattern example Early Alert Insights - Risk Quadrant High Performance High Engagement Student Performance High Performance Low Engagement Low Performance High Engagement Low Performance Low Engagement Student Engagement Future Research ◦ Expand our feature set ◦ Try to further minimize percentage of false alarms we raise ◦ Scalability enhancements leveraging Hadoop/Spark ◦ Dynamic modeling capabilities ◦ More UX research on building intuitive dashboards Join the mailing list! analytics@apereo.org (subscribe by sending a message to analytics+subscribe@apereo.org) Want the latest updates? Apereo Learning Analytics Initiative Wiki: https://confluence.sakaiproject.org/x/rIB_BQ GitHub: https://github.com/Apereo-Learning-Analytics-Initiative Sandeep Jayaprakash: Sandeep.Jayaprakash1@marist.edu