Tools & Techniques For Analysing Large Volumes Of Educational

advertisement
Learning Analytics – Tools &
Techniques For Analysing Large
Volumes Of Educational Data
Dr. Richard Price,
Data Scientist,
Planning Services,
Flinders University
History of The Techniques and Methods of
Learning Analytics
•
Learning analytics draws upon techniques from a number of established fields:
– Statistics
– Artificial Intelligence
– Machine Learning
– Data mining
– Social Network Analysis
– Text Mining and Web Analytics
– Operational Research
– Information Visualization
•
Application domains such as business intelligence, national security intelligence
and learning analytics all have an interest in analysing large volumes of data
from disparate data sources and are providing the business cases for the rapid
growth in ‘big data’ & data analytics.
•
Learning analytics encompasses support to both the business and teaching
functions of the learning institution.
Data Types
•
Structured data
– Typically stored in databases or spreadsheets, required to be managed in accordance
with a standardised storage format and ontology e.g. names, place names,
– E.g. SATAC applications, load, enrolments, FLO usage data
•
Unstructured data
– text, audio, imagery, video
– E.g. student email, chat rooms, questionnaire responses, lecture videos (audio &
video)
•
Different data types lend themselves to different analytical techniques. Unstructured data
often requires pre- processing prior to enable structured data analysis
•
Unstructured data analysis
– Text : document clustering , topic detection, entity extraction (people, places,
locations, dates, times etc., sentiment analysis (+,-)
– Audio : speaker identification, language identification, speech to text, keyword spotting
– Video analysis : face recognition, object recognition, target tracking
Structured Data Analysis
Descriptive statistics – sums, means, std devs, basic plotting
(graphs, charts, histograms)
Data visualisation –
tools that enable the human to see meaningful patterns in data
Machine learning tools that enable computers to find patterns in data to perform
either classification, clustering or prediction
e.g. decision trees, neural networks, support vector machines,
linear regression, self organising maps, k-means
Predictive analytics –
Algorithmic approaches (generally machine learning) for
predicting key target variables of interest.
Example LA projects: Identification of ‘at risk’ students - Student
Success Project, Future University enrolments, topic enrolments
Data Visualisation
Structured Data
Unstructured Data
Advanced Data Visualisation
Combining Structured & Unstructured Data Sources
Predicting Enrolments From Applications Data
•
Aim: To predict next year’s commencing load using past 3 years of
SATAC applications data.
•
Predictions based at the applicant level – not time series based.
•
Adopted a decision tree machine learning based approach.
•
Input variables for each applicant included: academic performance,
schooling, demographics (e.g. age, gender and postcode), information
regarding each of the applicant’s preferences such as; preference
number, course, institution, institution campus and a number of
proximity variables.
•
Output (target) variable : whether the student was enrolled at Flinders
University at Semester 1 census.
Predicting Enrolments From Applications Data
•
•
The three P’s - Prestige, Proximity & Price
Proximity input variables
– For two given points P1= (lat1, lon1), P2 = (lat2, lon2) the haversine distance
in kilometres between P1 and P2 is defined as:
d(P1,P2) = ACOS(SIN(lat1)*SIN(lat2) + COS(lat1)*COS(lat2)*COS(lon2lon1) ) * 6371
– Haversine distance calculated between applicant’s primary residence and all
SA major University campuses, with each value being an input into the
machine learning algorithm.
•
Two models developed, a) from 1st week in September b) from 2nd week in
January.
•
Training data consisted of 3 years of data 2011, 2012 & 2013 to predict 2014
enrolments - 25,551 training examples for September and 74,516 for January.
•
A number of commonly used machine learning algorithms could have been used,
we chose to adopt a CHAID decision tree algorithm.
Predicting Enrolments From Applications Data
• Results :
Model
September
January
Number Of
Applicants
(Predictions)
8557
26457
Predicted
Commencing
Load
1394
4340
Actual
Commencing
Load
1365
3858
% Error
2.1
12.5
• Lift Versus Output Percentile Profiles For the September Model
Training
Validation
Predicting Enrolments From Applications Data
•
The strong consistency of the lift profiles between training results and
test and validation results are indicative of structural patterns of
behavior that appear to exist across applicants to South Australian
Universities.
•
These patterns of behaviour appear to be being captured via the rules
contained within the decision trees produced during the training stage
of the modeling process.
•
Paper reporting this work accepted for presentation at the Australian
Association for Institutional Research Forum in November & possible
publication in the Journal of Institutional Research.
•
If future year’s performance proves to be similar, the approach should
be able to provide valuable support to the management of the
applications process.
Predicting Topic Enrolments
• Planning services approached by School of Nursing to predict
future topic enrolments to assist in resource and placement
management.
• Primary focus on predicting topic enrolments for 2nd year
undergraduate nursing topics.
• Largely deterministic program complicated by pre-requisites,
large numbers of advance credit 2nd year commencers,
relatively high percentages of part-time students and a lack of
historical training data due to a major course restructure in
2013.
Predicting Topic Enrolments
•
Similar machine learning (decision tree) approach adopted however input variables
consisted only of: course code, attendance type, and previous topics passed (no student
demographic or BOA information).
•
Binary target variable - 1 did enroll in target topic, 0 did not enroll in target topic
•
Under new program 2nd year topics being run for first time in 2014. Therefore only have 1st
year 2013 students to train and test on. Test results gave promising results and a model
was developed to predict topic enrolments for 2015.
•
Predictions for all seven 2nd year nursing topics were provided and validated by the School
as being consistent with their estimates.
•
The School of Nursing have requested for the approach to become part of their standard
business process in future years and discussions are underway as to how Planning
Services can meet this request.
•
School of Education, Humanities and Law have provided 12 topics of interest to assist
planning services further develop the approach within a less constrained course structure.
In Conclusion
•
Learning analytics is still in its infancy.
•
The Student Success Project, Topic Enrolment and University
Enrolment Prediction projects have demonstrated some early promise.
•
Across the University we have the technical expertise and strong
management support to progress learning analytics at Flinders.
•
Particularly keen to work with the faculties to progress analytics in
support of the teaching function.
•
Performing research-like activities within an operational environment –
looking for trailblazers without the fear of failure.
•
We’re keen, enthusiastic and we’re here to help !
Download