Video-Based Learning Analytics for Epistemic Frame Analysis in
Semi-Structured Interviews
Alejandro Andrade-Lotero, Ginette Delandshere, and Joshua A. Danish, Indiana University Bloomington,
Abstract: Developing analytical tools for coded qualitative observations is of increasing
interest to researchers in the learning sciences. We present examples of new analytical tools
for the study of learning practices from fine-grain video-based interaction analysis that
enables the researcher to analyze patterns in large samples of coded data. Researchers can
display visualizations, make stochastic models to assess the relative contribution of features,
and to identify other episodes in the rest of the data that have not been manually coded.
Researchers study student epistemologies because they are interested in understanding students’ strategies for
producing scientific explanations (Redish, 2004). The assumption is that sometimes students bring about the
appropriate set of resources when developing their explanations and sometimes not. Epistemic frames are
useful in understanding whether students develop conceptual understanding and sophisticated views of
knowledge and learning in their science courses (Scherr & Hammer, 2009). As part of students’ conceptual
ecologies, student framing is regarded as a basis for judging the stability and consistency of students’ scientific
explanations (Russ, Lee, & Sherin, 2012).
As an alternative to self-reports, researchers who study the contextual activation of epistemic resources
consider it more appropriate to observe participants’ communicative moves as well as other behavioral and
paralinguistic features during learning activities (Scherr & Hammer, 2009). In studying these episodes,
researchers rely on the observation of mutually reinforcing behaviors that reveal participants’ underlying sense
of what the activity is calling for with respect to knowledge (Scherr & Hammer, 2009). The assumption is that
not only are these clusters of behaviors interpretable to participants themselves but also to the eye of the
researcher that observes such interactions (Jordan & Henderson, 1995). Conversation and interaction analyses
are the number one methodological strategy because researchers indirectly infer students’ understanding of the
interaction. This assumption is framed within a belief that knowledge and learning are fundamentally based in
social practices and material ecologies (Lave & Wenger, 1991).
For instance, Russ et al. (2012) selected a set of verbal, non-verbal, and prosodic behaviors that were
useful in determining student epistemic framing. In particular, they pay attention to students’ body (forward or
backwards), their gaze (at the interviewer or not), their gesturing (prolific or not), their hedging language
(“Mmm, I don’t know”), and the clarity of their speech (tone and speed). With this method, Russ et al. (2012)
found that in the course of cognitive interviews, students tend to switch back and forth between at least three
different kinds of epistemic frames, which may be more or less productive in terms of the quality and
sophistication of their scientific explanations. Students may take on an inquiry frame, in which responses are
sense-making elaborations; or an expert frame, in which students talk in a way that reflects what they know as a
personal construction; or interpret the situation as an oral examination where they are expected to produce
‘correct’, wordy answers.
Traditionally, researchers code the corpus of video data holistically in search for transitions between
clusters. These clusters are eye-balled from an aggregate of participants’ verbal and non-verbal behaviors
(Scherr & Hammer, 2009). When transitions are identified, researchers label in the segments between
transitions with more or less meaningful codes, such as a color (e.g., green, red, blue) or with a direct
interpretation of the frame (e.g., inquiry, expert, or exam). These frames are contextually dependent because
they reflect the students’ epistemologies in the context of reasoning within particular activities.
While the traditional approach to capturing epistemic frames has been fruitful, it also introduces some
limitations. The coding of video recordings is time consuming and always has potential for errors as indicated
by inter-rater agreement estimates. In other cases, coding might not be the best strategy to find crucial moments
of interaction in a large corpus of video data. We wonder whether a more fine-grain analysis of video data
could be useful in identifying and representing patterns of actions and interactions during learning episodes that
might further develop our theoretical understanding of learning. Below we discuss how our new analytic
approach is at the intersection of theories of learning as processes and practices and recent developments in
learning analytic tools that could be useful in understanding such processes. Then, we introduce some examples
from our own research to illustrate how patterns of epistemic frames may be captured and represented.
Learning Analytics and Video Data
Many learning scientists use video recording as their source of data and employ a wide range of methodological
approaches to analyzing these video data to understand how people learn in various contexts (Derry et al.,
2010). Discourse analysis, interaction analysis, and ethnomethodology are methodological pillars in the
learning sciences, and these analytical perspectives often rely heavily on video data and allow researchers to
understand the moment-by-moment development of participants’ activities and communicative moves (often
associated with interaction analysis, Jordan & Henderson, 1995).
New techniques for the analysis of qualitative data are currently of growing interest in both learning
sciences and learning analytic communities (Martin & Sherin, 2013). These analytical methods serve as
complementary techniques to prior qualitative approaches. Specifically, classification techniques, such as
clustering and multidimensional scaling, as well as social network analysis, are gaining renewed traction. Most
often, however, these analytics are applied to textual data, such as transcripts, online forums, chats, and
computer logs (cf., Berland, Martin, Benton, Petrick Smith, & Davis, 2013; Blikstein, 2011; Sherin, 2013) but
have not been used in the context of video-based interaction analysis.
Our goal is to explore the use of these analytical methods in the context of methodologies such as
interaction analysis hoping to increase the result consistency and provide convergent support for the interaction
analyst (Sherin, 2013). In coding qualitative data, researchers pay attention to certain features that are relevant
to interpreting and understanding participants’ practices and learning processes. Quantitative analysis of a large
sample of these qualitatively coded observations may highlight relationships in the data that would provide
researchers with empirical evidence of the hidden underlying patterns of interaction practices within and across
The ultimate purpose in this paper is to explore new ways to capture and represent epistemic frames
and their constitutive elements (i.e., observable behaviors), as well as to illustrate the possible usefulness and
theoretical implications of this new analytical approach. We propose a fine-grain video analysis and the use of
large samples of observations of data points that would open the possibility to analyze patterns in the data that
would not be visible otherwise. The goal of this approach is to produce a systematic and consistent analysis of
the moment-by-moment interactions that goes beyond eyeballing a subsample of episodes across individual
In what follows, we discuss our fine-grain video-based analysis by introducing our own research in
interviewing first graders about complex systems thinking. Fifteen ~10-minute individual interviews of first
and second graders were subjected to analysis. These interviews were designed as a posttest after a short
instructional intervention to teach children about complex-systems thinking in the context of bees collecting
nectar (Danish, Saleh, & Andrade-Lotero, 2014). Each child answered nine questions about the behavior of
bees and ants gathering food. The questions were asked with the support of pictures and a Netlogo animation
(Wilensky, 1999).
A video-based analytic approach
We started our fine-grain analysis by producing two layers of data, a high-level ‘holistic’ set and a low-level
‘analytic’ set. It is important to notice that both data sets are based on the relevant features and epistemic
frames elaborated on Russ et al.’s (2012) paper. The high-level set (see table 1) contains time stamps for
transition moments along with labels for the frames. On the other hand, the low-level set (see an excerpt in
table 2), contains information per every 10-second episode with respect to five variables, that is, the five student
behavioral features.
Table 1: High-level epistemic episodes data set.
Time from
Oral Exam
Oral Exam
Table 2: Low-level behavioral features data set.
Subject ID Time
Oral Exam
No gesture
Not engaged
No eye contact
No gesture
Eye contact
No hedging
No gesture
Not engaged
No eye contact
We chose an arbitrary 10-sec interval based upon previous work (Andrade Lotero, Danish, Moreno, &
Perez, 2013), but we noticed that some behaviors are more or less stable across time than others. We suggest
that this interval can change depending on the detail of interest. For instance, body position tends to remain
more stable across longer periods of time, whereas gaze tends to vary frequently and rapidly. Because choosing
meaningful features is a crucial task in interpreting participants’ communicative moves, it is important to pay
attention to how these features change in relation to each other and, therefore, carry more or less information.
Specifically, frames and behavioral features are contextually dependent, and the relative importance of a feature
can be related to how much information it provides with respect to the other features and epistemic frames. To
illustrate this point, for instance, think of a body position that only changes in five-minute intervals whereas
epistemic frames may vary every 30 seconds. Conversely, if gaze changes rapidly in comparison to the
epistemic frame, the stable frames perhaps vary independently of gaze or body position. On the other hand,
because the length of the epistemic frames is unknown beforehand, a wide range of behavioral features would
serve as a good initial approximation and, in turn, should inform the length of the interval used to code the data.
We have been developing three possibilities for an analysis that starts with this two-layered data set: (a)
visualizations, (b) stochastic models, and (c) automated classifier. In what follows, and because of the
limitations of space in this short paper, we only elaborate on the first possibility, visualizations.
Learning analytics are great tools for creating visualizations (Duval, 2011). In particular, researchers can find
patterns of behaviors that may be clustering with epistemic frames, and from here they can make decisions
about further analyses or interpretations. We have developed two visualizations in the form of a descriptive
categorical time series analysis and a correspondence analysis.
Serial co-occurrence of features
A categorical line plot displays the serial development of the 10-second episodes across levels of features by
visually revealing the change in each. Evidence of behavioral co-occurrence is found in levels simultaneously
switching across features. For instance, figure 1 shows an example for one participant along her 10-min
interview. The levels of each of the five features are plotted along the Y-axis whereas feature levels are plotted
along the X-axis. Lines going up and down do not represent more or less proportion of levels; these differences
only reflect a change on the levels within each feature. The dashed vertical line on the left-hand side of the
figure marks the fifth 10-sec episode. It is apparent that there was a change in the body position, and that it
anticipated a change on the other features during the 10-sec episode that followed. Instances of these crossfeature changes can be seen along the rest of the interview.
Figure 1: An example of a time-series line plot for a particular subject.
Vector space displaying co-occurrence of features and frames
A correspondence analysis provides chi-squared distances in contingency tables from which one can obtain
spatial coordinates that discriminate among categories, also called ‘inertias.’ By using differences between
expected and observed frequencies across combination of features and epistemic frames, one can plot the
relative position of each level on a high-dimensional space. This analysis was performed including 7 cases to
ensure that sufficient information was provided across all combination of features and frames. In figure 2, we
present the relative distances for each level across all five features with respect to the three types of frames. To
produce this plot, several features were dichotomized with the aim of avoiding empty cells and creating at most
two dimensions that separate among categories. This visualization shows how particular levels of behaviors
cluster together with types of frames, an original finding from our research. For instance, making eye contact
with the interviewer, use of hedging language, and speaking softly tend to co-occur with the inquiry frame.
Prolific gesturing, no hedging language, and speaking loud tend to co-occur with the expert frame. Fidgeting,
quiet or neutral speech and lack of eye contact tend to co-occur with the exam frame. On the other hand, it
seems that body position (forward or backwards), along with no gesturing, are not clearly co-occurring with any
type of frame.
Figure 2: Correspondence analysis between frames and features.
Future work
In further steps, we are building stochastic models that can assess the relative contribution of each feature in
representing the epistemic frames. For instance, we can quantify the strength of the association between
features and type of frames. We can also build a model in which frames can be represented as a linear
combination of features. Finally, building off of this model, we can use it to identify other episodes in the rest
of the data corpus. The model, thus, becomes a ‘training’ set from which one can automate a classifier to
classify the part of the data set that had not been subjected to traditional frame coding.
Developing analytical tools for coded qualitative observations is of increasing interest to researchers in the
learning sciences. Although some work has been done in research areas such as textual records from posts and
logs in online participation, video-research, on the other hand, has received less attention. We have presented
here examples of new analytical tools for the study of learning practices from video-based interaction analysis.
One first consequence of being able to conduct such fine-grain video analysis is that it enables the researcher to
analyze patterns in large samples of observations or coded data that would not be possible otherwise. It allows
researchers to display visualizations of the moment-by-moment interaction across features of interest and across
the whole data corpus. It also makes stochastic models available from which one can assess the relative
contribution of each feature in representing the practices of interest. Another possible benefit is that these
models could be used to identify other episodes in the rest of the data corpus that have not been manually coded.
However, much work remains. For instance, current technology is still not sufficiently fine-tuned to
provide automated capturing of any given features. We are confident that in a near future such procedures will
become a common practice for researchers interested in video-based interaction analysis
Andrade Lotero, L. A., Danish, J. A., Moreno, J., & Perez, L. (2013). Measuring ‘Framing’ Differences of
Single-Mouse and Tangible Inputs on Patterns of Collaborative Learning. Paper presented at the
CSCL, Madison, Wisconsin.
Berland, M., Martin, T., Benton, T., Petrick Smith, C., & Davis, D. (2013). Using learning analytics to
understand the learning pathways of novice programmers. Journal of the Learning Sciences, 22(4),
Blikstein, P. (2011). Using learning analytics to assess students' behavior in open-ended programming tasks.
Paper presented at the Proceedings of the 1st international conference on learning analytics and
Danish, J. A., Saleh, A., & Andrade-Lotero, L. A. (2014). Software Scaffolds for Supporting Teacher-Led
Inquiry into Complex Systems Concepts. Paper presented at the AERA annual meeting, Philadelphia,
Derry, S. J., Pea, R. D., Barron, B., Engle, R. A., Erickson, F., Goldman, R., . . . Sherin, M. G. (2010).
Conducting video research in the learning sciences: Guidance on selection, analysis, technology, and
ethics. The Journal of the Learning Sciences, 19(1), 3-53.
Duval, E. (2011). Attention please!: learning analytics for visualization and recommendation. Paper presented
at the Proceedings of the 1st International Conference on Learning Analytics and Knowledge.
Jordan, B., & Henderson, A. (1995). Interaction analysis: Foundations and practice. The Journal of the Learning
Sciences, 4(1), 39-103.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation: Cambridge Univ Pr.
Martin, T., & Sherin, B. (2013). Learning Analytics and Computational Techniques for Detecting and
Evaluating Patterns in Learning: An Introduction to the Special Issue. Journal of the Learning
Sciences, 22(4), 511-520.
Redish, E. F. (2004). A theoretical framework for physics education research: Modeling student thinking. arXiv
preprint physics/0411149.
Russ, R. S., Lee, V. R., & Sherin, B. L. (2012). Framing in cognitive clinical interviews about intuitive science
knowledge: Dynamic student understandings of the discourse interaction. Science Education, 96(4),
Scherr, R. E., & Hammer, D. (2009). Student behavior and epistemological framing: Examples from
collaborative active-learning activities in physics. Cognition and Instruction, 27(2), 147-174.
Sherin, B. (2013). A computational study of commonsense science: An exploration in the automated analysis of
clinical interview data. Journal of the Learning Sciences, 22(4), 600-638.
Wilensky, U. (1999). NetLogo: Center for connected learning and computer-based modeling. Northwestern
We are grateful to Asmalina Saleh, Jacke McWilliams, and Branden Bryan for their contributions in various
steps along the process.