Open Academic Analytics Initiative Prof. Eitel Lauria

advertisement
Open Academic Analytics Initiative
Prof. Eitel Lauria
Learning analytics is the term used to describe the application of data mining techniques to develop
predictive models that can help monitor and anticipate student performance and take action in
issues related to student teaching and learning. Learning analytics, which “combines select
institutional data, statistical analysis, and predictive modeling to create intelligence upon which
students, instructors, or administrators”1 (Baepler & Murdoch, 2010) can act as a means to improve
academic success, holds great potential to provide new and innovative technological tools for
improving course and degree completion.
Marist College is conducting pioneering research in this emerging field, starting with the Open
Academic Analytics Initiative (OAAI)2, a project supported through the Next Generation Learning
Challenges (NGLC) and primarily funded by the Bill & Melinda Gates Foundation. Through this
two year project, the OAAI has been able to develop predictive models of academically at-risk
students using Marist College’s data and we have successfully run pilots on four partner
institutions (community colleges and historically black colleges and universities around the
country), generating academic alert reports and helping deploy interventions on 1000+ students,
with the purpose of improving their chances of academic success.
Marist College OAAI has received wide recognition for its work: it is the 2013 Computerworld
Honors Laureate in the Emerging technology category3 4, and the recipient of the Campus
Technology Magazine 2013 Innovator's Award in the Teaching and Learning category.
From a predictive modeling perspective the project has shown that a) It is feasible to implement
an open-source early alert prototype for higher education; b) Predictive models can help early
detection of at-risk students; c) There is initial evidence that predictive models can be ported to
other institutions (with some limits and considerations) c) certain predictors (such as partial
contributions to the final grade) have great predictive power for early detection of students at
academic risk.
These findings have also opened a number of avenues of research that we are actively pursuing
and that require / will require computing power:
a) It is our intention to build a large datawarehouse of academic data from where libraries of
predictive models can be built and subsequently applied on institutions of similar
characteristics. The predictive models built doing early detection of students at risk included
a wide range of predictors extracted from multiple sources, among them: demographic data,
previous academic performance, student interaction with the learning management system,
and partial contributions to the final grade. We have theorized that students interactions with
1
Baepler, P. and C.J. Murdoch, Academic Analytics and Data Mining in Higher Education.
International Journal for the Scholarship of Teaching and Learning 2010. 4(2).
2
https://confluence.sakaiproject.org/pages/viewpage.action?pageId=75671025
3
http://www.eiseverywhere.com/ehome/49069/83917/?&
4
http://www.marist.edu/publicaffairs/computerworldoaai2013.html
the learning management system would have considerable predictive power (e.g. a students
who rarely accesses the LMS should probably do poorly in the course, as opposed to
students who consistently log in to LMS sessions and access course materials)
The predictive models built as part of our research have shown limited predictive power of LMS
related features, when compared to other predictors, and this certainly deserves further
exploration. One possible explanation is that we have used aggregates (ratios on interaction over
the course average interaction), instead of using measures of regularity (e.g. it could be inferred
that good students have a constant flow of interaction whereas at-risk students have at most
bursts of interaction with the LMS). Part of our current research includes the identification of
patterns (signatures) of interaction that could help profile different types of students. But these
pattern recognition tasks require the analysis of much larger volumes of data (the analysis is no
longer limited to aggregates)
The activities described in the aforementioned paragraphs demand processing very large number
of records coming from different sources. Our current database platform is mostly relational but
it is our goal to move the software stacks better equipped for handling large volumes of data (e.g.
Hadoop, Spark and its machine learning libraries).
The LinuxOne platform includes these and other software stacks that could greatly enhance and
speed up our ongoing research.
Download