Open Academic Analytics Initiative Prof. Eitel Lauria Learning analytics is the term used to describe the application of data mining techniques to develop predictive models that can help monitor and anticipate student performance and take action in issues related to student teaching and learning. Learning analytics, which “combines select institutional data, statistical analysis, and predictive modeling to create intelligence upon which students, instructors, or administrators”1 (Baepler & Murdoch, 2010) can act as a means to improve academic success, holds great potential to provide new and innovative technological tools for improving course and degree completion. Marist College is conducting pioneering research in this emerging field, starting with the Open Academic Analytics Initiative (OAAI)2, a project supported through the Next Generation Learning Challenges (NGLC) and primarily funded by the Bill & Melinda Gates Foundation. Through this two year project, the OAAI has been able to develop predictive models of academically at-risk students using Marist College’s data and we have successfully run pilots on four partner institutions (community colleges and historically black colleges and universities around the country), generating academic alert reports and helping deploy interventions on 1000+ students, with the purpose of improving their chances of academic success. Marist College OAAI has received wide recognition for its work: it is the 2013 Computerworld Honors Laureate in the Emerging technology category3 4, and the recipient of the Campus Technology Magazine 2013 Innovator's Award in the Teaching and Learning category. From a predictive modeling perspective the project has shown that a) It is feasible to implement an open-source early alert prototype for higher education; b) Predictive models can help early detection of at-risk students; c) There is initial evidence that predictive models can be ported to other institutions (with some limits and considerations) c) certain predictors (such as partial contributions to the final grade) have great predictive power for early detection of students at academic risk. These findings have also opened a number of avenues of research that we are actively pursuing and that require / will require computing power: a) It is our intention to build a large datawarehouse of academic data from where libraries of predictive models can be built and subsequently applied on institutions of similar characteristics. The predictive models built doing early detection of students at risk included a wide range of predictors extracted from multiple sources, among them: demographic data, previous academic performance, student interaction with the learning management system, and partial contributions to the final grade. We have theorized that students interactions with 1 Baepler, P. and C.J. Murdoch, Academic Analytics and Data Mining in Higher Education. International Journal for the Scholarship of Teaching and Learning 2010. 4(2). 2 https://confluence.sakaiproject.org/pages/viewpage.action?pageId=75671025 3 http://www.eiseverywhere.com/ehome/49069/83917/?& 4 http://www.marist.edu/publicaffairs/computerworldoaai2013.html the learning management system would have considerable predictive power (e.g. a students who rarely accesses the LMS should probably do poorly in the course, as opposed to students who consistently log in to LMS sessions and access course materials) The predictive models built as part of our research have shown limited predictive power of LMS related features, when compared to other predictors, and this certainly deserves further exploration. One possible explanation is that we have used aggregates (ratios on interaction over the course average interaction), instead of using measures of regularity (e.g. it could be inferred that good students have a constant flow of interaction whereas at-risk students have at most bursts of interaction with the LMS). Part of our current research includes the identification of patterns (signatures) of interaction that could help profile different types of students. But these pattern recognition tasks require the analysis of much larger volumes of data (the analysis is no longer limited to aggregates) The activities described in the aforementioned paragraphs demand processing very large number of records coming from different sources. Our current database platform is mostly relational but it is our goal to move the software stacks better equipped for handling large volumes of data (e.g. Hadoop, Spark and its machine learning libraries). The LinuxOne platform includes these and other software stacks that could greatly enhance and speed up our ongoing research.