Locating Discussion Practices in Computational Models of

advertisement
Locating Discussion Practices in Computational Models of Text
Carolyn P. Rosé, Miaomiao Wen, & Diyi Yang, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh
PA, 15213, {cprose,mwen,diyiy}@cs.cmu.edu
Abstract: Language modeling techniques such as Latent Semantic Analysis (LSA) (Dumais
et al., 1988) and Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) have been
used to construct easy to obtain lenses on discussion behavior. They are desirable because
they do not require labeled training data and are thus thought of as an easy way to perform a
meaningful analysis of text. In this paper we contrast an LDA style analysis that
approximates an analysis of motivation and cognitive engagement with an analysis making
use of hand coded training data to model these phenomena directly. In both cases, we provide
a demonstration that the automated indicators have predictive validity in connection with
attrition over time, although the demonstration is stronger in the case of the direct modeling
approach. We conclude with a discussion of the trade-offs between these two approaches and
implications for future work in the area of Learning Analytics applied to discussion data.
Introduction
As the field of Learning Analytics matures, it seeks to move beyond shallow observations of student behavior
that are easy to measure such as number of posts, number of back clicks, time spent watching videos, etc. It
reaches instead for more meaningful latent factors that these observational variables may reflect, like
commitment, metacognitive awareness, or cognitive engagement. The concept of a practice fits within this class
of more abstract notions. Practices encompass more than isolated behaviors. They reflect identity, goals, and
intentionality.
In this paper, we focus specifically on discussion practices that are relevant in a learning context. In
particular we focus on practices of signaling motivation and cognitive engagement. It is not controversial that
motivation and cognitive engagement are student orientations that are important for learning. As students
interact with one another in a course discussion, they reflect their state of motivation and cognitive engagement
in subtle or overt ways. At times, it may be part of rapport building or support seeking to engage in
commiserating about a struggle to find the wherewithal to persist in a course. At other times, it may be a way of
projecting an image of a competent student to reflect high motivation to succeed. It is not surprising to find
students who appear unmotivated and then drop out of a course.
For a number of reasons it may be strategic to computationally model these practices. For example, if
it is possible to identify students with low motivation or low cognitive engagement, it may be easier for
instructors or mentors to identify which students are vulnerable so they can allocate their mentoring and support
accordingly. Alternatively, reflecting back to students their detected levels of motivation and cognitive
engagement might support their metacognitive awareness and self-regulation.
The field of language technologies, and text mining in particular, offer a variety of modeling
approaches that may be adopted and applied within the area of Learning Analytics. In this paper, we contrast
two alternative approaches. In a top-down approach, we model motivation and cognitive engagement directly,
using carefully constructed meaningful knowledge sources and hand coded training data. In a bottom-up
approach we come to a model indirectly by applying an exploratory modeling approach and identifying latent
factors that turn out to reflect motivation and cognitive engagement. Both approaches show some measure of
predictive validity. We provide a discussion comparing and contrasting what is gained and lost in these two
alternative approaches and conclude with some directions for future research.
Computational Modeling Approaches
In this paper, we compare two different approaches to modeling discussion practices associated with learning.
First, we model motivation and cognitive engagement directly, using carefully constructed meaningful
knowledge sources and hand coded training data. And second, we apply an exploratory modeling approach and
identifying latent factors that turn out to reflect motivation and cognitive engagement. In both cases, in order to
validate the indicators provided by the models, we used a survival model to measure the extent to which the
indicators predict attrition over time (Rabe-Hesketh & Skrondal, 2012). Survival analysis is known to provide
less biased estimates than simpler techniques (e.g., standard least squares linear regression) that do not take into
account the potentially truncated nature of time-to-event data (e.g., users who had not yet left the community at
the time of the analysis but might at some point subsequently). From a more technical perspective, a survival
model is a form of proportional odds logistic regression, where a prediction about the likelihood of a failure
occurring is made at each time point based on the presence of some set of predictors.
The estimated weights on the predictors are referred to as hazard ratios. The hazard ratio of a predictor
indicates how the relative likelihood of the failure occurring increases or decreases with an increase or decrease
in the associated predictor. A hazard ratio greater than 1 signifies that higher than average measure of an
independent variable is predictive of higher than average dropout at the next time point. In particular, by
subtracting 1 from the hazard ratio, the result indicates what percentage more likely to drop out at the next time
point a participant is estimated to be if the value of the associated independent variable is 1 standard deviation
higher than average. For example, a hazard ratio of 2 indicates a doubling of probability. A hazard ratio
between 0 and 1 signifies that higher than measure of an independent variable is predictive of lower than
average dropout at the next time point. In particular, if the hazard ratio is .3, then a participant is 70% less
likely to drop out at the next time point if the value of the associated independent variable is 1 standard
deviation higher than average for that student.
In preparation for a partnership with an instructor team for a Coursera MOOC that was launched in Fall
of 2013, we were given permission by Coursera to extract the discussion data from and study a small number of
courses. One of those courses was a Python programming course, “Learn to Program: The Fundamentals”,
offered in August 2013, which has 3,590 active users and 24,963 forum posts. This course was offered as a
seven week course. It includes seven week specific subforums and a separate general subforum for more general
discussion about the course. Our analysis is limited to behavior within the discussion forums.
Direct Modeling of Motivation and Cognitive Engagement
We have developed and validated a targeted approach to measuring motivation towards a course and cognitive
engagement in our prior work (Wen et al., in press). Here we summarize that work. In order to model
motivation, we extracted roughly 1000 posts altogether from two other MOOCs. We used Amazon’s
Mechanical Turk service to have the posts hand annotated for displayed level of motivation towards the course
on a likert scale. We evaluated the reliability of the hand coding using the Inter-Class Correlation, which was
high (> .7). Next we used a machine learning model to learn to predict high versus low motivation based on
these hand codings, and we applied this model to all of the data in the Python MOOC. As a measure of
cognitive engagement we used a carefully constructed publically available dictionary of abstractness that can be
used to score words based on how abstract/concrete they are on a likert scale (Beukeboom, 2014). We averaged
the scores of words within a post to derive an abstractness score, which we considered to be a measure of
cognitive engagement with the material. In our survival analysis, both the motivation indicator and the cognitive
engagement measure were found to predict lower attrition. In particular, the hazard ratio associated with
motivation was .84, which signifies that students whose posts were rated as high motivation at a time point were
16% less likely than average to drop out at the next time point. The hazard ratio associated with cognitive
engagement was .53, indicating that students whose posts reflected a standard deviation higher measure of
cognitive engagement at a time point were 47% less likely than average to drop out at the next time point.
Probabilistic Graphical Modeling Approach
Our probabilistic graphical model used in an exploratory way in our work integrates two types of previously
developed probabilistic graphical models (Yang et al., under review). First, in order to obtain a soft partitioning
of the social network of the discussion forums, we used a Mixed Membership Stochastic Blockmodel (MMSB)
(Airoldi et al., 2008). The advantage of MMSB over other graph partitioning methods is that it does not force
assignment of students solely to one subcommunity. The model can track the way students move between
subcommunities during their participation. We have linked the community structure that is discovered by the
model with a probabilistic topic model, so that for each person a distribution of identified communicative
themes is estimated that mirrors the distribution across subcommunities. By integrating these two modeling
approaches so that the representations learned by each are pressured to mirror one another, we are able to learn
structure within the text portion of the model that helps identify the characteristics of within-subcommunity
communication that distinguish various subcommunities from one another. A well known approach is Latent
Dirichlet Allocation (LDA) (Blei et al. 2003), which is a generative model and is effective for uncovering the
thematic structure of a document collection. In an LDA model, each latent word class is represented as a
distribution of words. The words that rank most highly in the distribution are the words that are treated as most
characteristic of the associated latent class, or topic.
An important parameter that must be set prior to application of the modeling framework is the number
of subcommunities to identify. In this set of experiments, we set the number to twenty for each MOOC in order
to enable the models to identify a diverse set of subcommunities reflecting different compositions in terms of
content focus, participation goals, and time of initiating active participation. The trained model identifies a
distribution of subcommunity participation scores across the twenty subcommunities for each student on each
thread. Thus we are able to construct a subcommunity distribution for each student for each week of active
participation in the discussion forums by averaging the subcommunity distributions for that student on each
thread that student participated in that week. In this analysis we refer to student-weeks because for each
student, for each week of their active participation in the discussion forum, we have one observational vector
that we treat as one data point. The text associated with that student-week contains all of the messages posted
by that student during that week. We will use our integrated model to identify themes in these student-weeks by
examining the student-weeks that have high scores for the topics that showed significantly higher or lower than
average attrition in the quantitative analysis. We identified four such topics, referred to as Topic9 (Hazard ratio
1.06), Topic13 (Hazard ratio .95), Topic17 (Hazard ratio 1.09), and topic18 (Hazard ratio .95) below. In
contrast to the effect of the directly modeled variables, these hazard ratios indicate a weaker effect, between 5%
and 10%.
When an LDA model is trained, the most visible output that represents that trained model is a set of
word distributions, one associated with each topic. That distribution specifies a probabilistic association
between each word in the vocabulary of the model and the associated topic. Top ranking words are most
characteristic of the topic, and lowest ranking words are hardly representative of the topic at all. Typically when
LDA models are used in research such as presented in this paper, a table is offered that lists associations
between topics and top ranking words, sometimes dropping words from the list that don’t form a coherent set in
connection with the other top ranking words. The set of words is then used to identify a theme. In our
methodology, we did not interpret the word lists out of the context of the textual data that was used to induce
them. Instead, we used the model to retrieve messages that fit each of the identified topics using a maximum
likelihood measure and then assigned an interpretation to each topic based on the association between topics and
texts rather than directly to the word lists. Word lists on their own can be misleading, especially with an
integrated model like our own where the a student may get a high score for a topic within a week more because
of who he was talking to than for what he was saying. We will see that at best, the lists of top ranking words
bore an indirect connection with the texts in top ranking student-weeks. However, we do see that the texts
themselves that were associated with top ranking student-weeks were nevertheless thematically coherent.
Because LDA is an unsupervised language processing technique, it would not be reasonable to expect
that the identified themes would exactly match human intuition about organization of topic themes, and yet as a
technique that models word co-occurrence associations, it can be expected to identify some things that would be
make sense as thematically associated. In this light, we examine sets of posts that the model identifies as
strongly associated with each of the topics identified as predicting significantly more or less drop out in the
survival analysis, and then for each one, identify a coherent theme. Apart from the insights we gain about
reasons for attrition from the qualitative analysis, what we learn at a methodological level is that this new
integrated model identifies coherent themes in the data, in the spirit of what is intended for LDA, and yet the
themes may not be represented strictly in word co-occurrences. And thus, we must interpret this integrated
model with more care than a typical LDA model.
What is interesting about the Python course is that we have topics within the same course, some of
which predict higher attrition and others that predict lower attrition, so we can compare them to see what is
different in their nature. In each case, we see that the connection between the top ranking words in the topic and
the topic themes as identified from top ranking student-weeks bore little connection to one another, although we
see some inklings of connection at an abstract level. Topics that signified higher than average attrition were
more related to getting set up for the course, and possibly indicating confusion with course procedures. Topics
that signaled lower than average attrition were ones where students were deeply engaged with the content of the
course, working together towards solutions. The interactions between students in the discussions associated
with higher attrition were not particularly dysfunctional as discussions, they simply lacked a mentoring
component that might have helped the struggling students to get past their initial hurdles and make a personal
connection with the substantive course material.
Topic9 [more attrition]. Top ranking words included keyword, trying, python, formulate, toolbox,
workings, coursera, vids, seed, and tries. The top ranking student-weeks contained lots of requests to be added
to study groups. But in virtually all of these cases, that was the last message posted by the student that week.
Similarly, a large number of these student-weeks included an introduction and no other text. What appears to
unify these student-weeks is that these are students who came in to the course, made an appearance, but were
not very quick to engage in discussions about the material. Some exceptions within the top ranking studentweeks were requests for help with course procedures.
Topic13 [less attrition]. Top ranking words include name error, uses, mayor, telly, setattr, hereby,
gets, could be, every time, and adviseable. In contrast to Topic9, this topic contained many top ranking studentweeks with substantial discussion about course content. We see students discussing their struggles with the
assignment, but not just complaining about confusion. Rather, we see students reasoning out solutions together.
For example, “So 'parameter' is just another word for 'variable;' and an 'argument' is a specific value given to the
variable. Okay; this makes a lot more sense now.” or “For update_score(): Why append? are you adding a new
element to a list? You should just update the score value.”
Topic17 [more attrition]. Top ranking words include was beginner, amalgamate, thinking, defaultdef,
less, Canada, locating, fundamentalist, only accountable, and English. Like Topic 9, this topic contains many
top ranking student-weeks with requests to join study groups as the only text for the week. The substantive
technical discussion was mainly related to getting set up for the course rather than about Python programming
per se, for exampke “Hi;I am using ubuntu 12.04. I have installed python 3.2.3 Now my ubuntu12.04 has two
version of python. How can I set default version of 3.2.3Please reply.” or “For Windows 8 which version should
I download ?Downloaded Python 3.3.2 Windows x86 MSI Installer?and I got the .exe file with the prompter ...
but no IDLE application”.
Topic18 [less attrition]. Top ranking words include one contribution, accidental, workable, instance,
toolbox, wowed, meant, giveaway, patient, and will accept. Like topic13, we see a great deal of talk related to
problem solving, for example “i typed s1.find(s2;s1.find(s2)+1;len(s1)) and i can't get why it tells me it's
wrong? do not use am or pm.... 3am=03:00 ; 3pm=15:00”, or “I don't see why last choice doesn't work. It is
basically the same as the 3rd choice. got it! the loop continues once it finds v. I mistakenly thought it breaks
once it finds v. thanks!”. The focus was on getting code to work. Perhaps “workable” is the most representative
of the top ranking words.
In applying this integrated model that brings together a view of the data from a social network
perspective with a complementary view from text contributed by students in their threaded discussions, we see
how we are able to identify emergent subcommunity structure that enables us to identify subcommunities with
differential rates of attrition. A qualitative posthoc analysis suggests that subcommunities associated with
higher attrition demonstrate lower comfort with course procedures and lower expressed motivation and
cognitive engagement with the course materials, whereas subcommunities associated with lower attrition reflect
higher motivation and cognitive engagement, with is consistent with the results obtained through the direct
modeling approach.
Discussion and Conclusions
It is not at all surprising that the approach that required more time to develop, namely the direct modeling
approach, yielded stronger results quantatively. Nevertheless, the analysis presented in this paper offers some
lessons learned and directions for future work. The first important take home message is that although the
exploratory model produced meaningful results, it is important to note that does not mean that one could not do
better with a more carefully crafted measure. Researchers should consider carefully how much resolution into
their data they are losing if they chose to take an “effortless” approach. The second important take home
message is that it can be dangerous to read too much into the lists of top ranking words per topic that come out
of LDA variants, although it may again be a tempting shortcut. Nothing replaces actually going back to the data
to see what structure the model is really picking up.
Perhaps the important take home message is that while the top ranking words per topic did not turn out
to be well represented in top ranking posts, we do find some connection at an abstract level with the more
abstract themes that the topics were revealed to pick up on because of leveraging the network structure. We see
here evidence that the network structure has the potential to make an important contribution to the interpretation
of the text. While pure text based approaches like standard LDA rely entirely on word co-occurrences, we see
here that word co-occurrences and word overlap may miss collections of thematically related posts where the
relationship is not reflected in the words because the commonality transcends individual words and operates at
the level of functional word classes, such as words that signal emotion or words used as greetings. It is this last
take home message that is the most critical for modeling discussion practices. What we see here is that if we
desire to push further on exploratory models that are useful for identification of practices rather than just low
level observed behaviors, we should explore further how network structure may be leveraged to raise the level
of awareness of the models above limited notion of commonality found in simple word co-occurrences.
References
Airoldi, E., Blei, D., Fienberg, S. & Xing, E. P. (2008). Mixed Membership Stochastic Blockmodel, Journal of
Machine Learning Research, 9(Sep):1981--2014, 2008.
Beukeboom, C. J., Tanis, M., and Vermeulen, I. E. 2013. The Language of Extraversion Extraverted People
Talk More Abstractly, Introverts Are More Concrete. Journal of Language and Social Psychology,
32(2), 191-201.
Blei, D., Ng, A. and Jordan, M. (2003). Latent dirichlet allocation. J. Mach. Learn. Res (3) 993-1022.
Dumais, S. T., G. W. Furnas, et al. (1988). Using Latent Semantic Analysis to Improve Access to Textual
Information. Conference on Human Factors in Computing Systems: Proceedings of the SIGCHI
conference on Human Factors in Computing Systems, Washington, D.C., ACM.
Koller, D. & Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques, The MIT Press.
Rabe-Hesketh, S. & Skrondal, S. (2012). Multilevel and Longitudinal Modeling Using Stata (Volumes I and II),
STATA Press.
Wen, M., Yang, D., Rosé, D. (2014). Linguistic Reflections of Student Engagement in Massive Open Online
Courses, in Proceedings of the International Conference on Weblogs and Social Media
Download