Learning Analytics – Tools & Techniques For Analysing Large Volumes Of Educational Data Dr. Richard Price, Data Scientist, Planning Services, Flinders University History of The Techniques and Methods of Learning Analytics • Learning analytics draws upon techniques from a number of established fields: – Statistics – Artificial Intelligence – Machine Learning – Data mining – Social Network Analysis – Text Mining and Web Analytics – Operational Research – Information Visualization • Application domains such as business intelligence, national security intelligence and learning analytics all have an interest in analysing large volumes of data from disparate data sources and are providing the business cases for the rapid growth in ‘big data’ & data analytics. • Learning analytics encompasses support to both the business and teaching functions of the learning institution. Data Types • Structured data – Typically stored in databases or spreadsheets, required to be managed in accordance with a standardised storage format and ontology e.g. names, place names, – E.g. SATAC applications, load, enrolments, FLO usage data • Unstructured data – text, audio, imagery, video – E.g. student email, chat rooms, questionnaire responses, lecture videos (audio & video) • Different data types lend themselves to different analytical techniques. Unstructured data often requires pre- processing prior to enable structured data analysis • Unstructured data analysis – Text : document clustering , topic detection, entity extraction (people, places, locations, dates, times etc., sentiment analysis (+,-) – Audio : speaker identification, language identification, speech to text, keyword spotting – Video analysis : face recognition, object recognition, target tracking Structured Data Analysis Descriptive statistics – sums, means, std devs, basic plotting (graphs, charts, histograms) Data visualisation – tools that enable the human to see meaningful patterns in data Machine learning tools that enable computers to find patterns in data to perform either classification, clustering or prediction e.g. decision trees, neural networks, support vector machines, linear regression, self organising maps, k-means Predictive analytics – Algorithmic approaches (generally machine learning) for predicting key target variables of interest. Example LA projects: Identification of ‘at risk’ students - Student Success Project, Future University enrolments, topic enrolments Data Visualisation Structured Data Unstructured Data Advanced Data Visualisation Combining Structured & Unstructured Data Sources Predicting Enrolments From Applications Data • Aim: To predict next year’s commencing load using past 3 years of SATAC applications data. • Predictions based at the applicant level – not time series based. • Adopted a decision tree machine learning based approach. • Input variables for each applicant included: academic performance, schooling, demographics (e.g. age, gender and postcode), information regarding each of the applicant’s preferences such as; preference number, course, institution, institution campus and a number of proximity variables. • Output (target) variable : whether the student was enrolled at Flinders University at Semester 1 census. Predicting Enrolments From Applications Data • • The three P’s - Prestige, Proximity & Price Proximity input variables – For two given points P1= (lat1, lon1), P2 = (lat2, lon2) the haversine distance in kilometres between P1 and P2 is defined as: d(P1,P2) = ACOS(SIN(lat1)*SIN(lat2) + COS(lat1)*COS(lat2)*COS(lon2lon1) ) * 6371 – Haversine distance calculated between applicant’s primary residence and all SA major University campuses, with each value being an input into the machine learning algorithm. • Two models developed, a) from 1st week in September b) from 2nd week in January. • Training data consisted of 3 years of data 2011, 2012 & 2013 to predict 2014 enrolments - 25,551 training examples for September and 74,516 for January. • A number of commonly used machine learning algorithms could have been used, we chose to adopt a CHAID decision tree algorithm. Predicting Enrolments From Applications Data • Results : Model September January Number Of Applicants (Predictions) 8557 26457 Predicted Commencing Load 1394 4340 Actual Commencing Load 1365 3858 % Error 2.1 12.5 • Lift Versus Output Percentile Profiles For the September Model Training Validation Predicting Enrolments From Applications Data • The strong consistency of the lift profiles between training results and test and validation results are indicative of structural patterns of behavior that appear to exist across applicants to South Australian Universities. • These patterns of behaviour appear to be being captured via the rules contained within the decision trees produced during the training stage of the modeling process. • Paper reporting this work accepted for presentation at the Australian Association for Institutional Research Forum in November & possible publication in the Journal of Institutional Research. • If future year’s performance proves to be similar, the approach should be able to provide valuable support to the management of the applications process. Predicting Topic Enrolments • Planning services approached by School of Nursing to predict future topic enrolments to assist in resource and placement management. • Primary focus on predicting topic enrolments for 2nd year undergraduate nursing topics. • Largely deterministic program complicated by pre-requisites, large numbers of advance credit 2nd year commencers, relatively high percentages of part-time students and a lack of historical training data due to a major course restructure in 2013. Predicting Topic Enrolments • Similar machine learning (decision tree) approach adopted however input variables consisted only of: course code, attendance type, and previous topics passed (no student demographic or BOA information). • Binary target variable - 1 did enroll in target topic, 0 did not enroll in target topic • Under new program 2nd year topics being run for first time in 2014. Therefore only have 1st year 2013 students to train and test on. Test results gave promising results and a model was developed to predict topic enrolments for 2015. • Predictions for all seven 2nd year nursing topics were provided and validated by the School as being consistent with their estimates. • The School of Nursing have requested for the approach to become part of their standard business process in future years and discussions are underway as to how Planning Services can meet this request. • School of Education, Humanities and Law have provided 12 topics of interest to assist planning services further develop the approach within a less constrained course structure. In Conclusion • Learning analytics is still in its infancy. • The Student Success Project, Topic Enrolment and University Enrolment Prediction projects have demonstrated some early promise. • Across the University we have the technical expertise and strong management support to progress learning analytics at Flinders. • Particularly keen to work with the faculties to progress analytics in support of the teaching function. • Performing research-like activities within an operational environment – looking for trailblazers without the fear of failure. • We’re keen, enthusiastic and we’re here to help !