Video-Based Learning Analytics for Epistemic Frame Analysis in Semi-Structured Interviews Alejandro Andrade-Lotero, Ginette Delandshere, and Joshua A. Danish, Indiana University Bloomington, laandrad@indiana.edu Abstract: Developing analytical tools for coded qualitative observations is of increasing interest to researchers in the learning sciences. We present examples of new analytical tools for the study of learning practices from fine-grain video-based interaction analysis that enables the researcher to analyze patterns in large samples of coded data. Researchers can display visualizations, make stochastic models to assess the relative contribution of features, and to identify other episodes in the rest of the data that have not been manually coded. Researchers study student epistemologies because they are interested in understanding students’ strategies for producing scientific explanations (Redish, 2004). The assumption is that sometimes students bring about the appropriate set of resources when developing their explanations and sometimes not. Epistemic frames are useful in understanding whether students develop conceptual understanding and sophisticated views of knowledge and learning in their science courses (Scherr & Hammer, 2009). As part of students’ conceptual ecologies, student framing is regarded as a basis for judging the stability and consistency of students’ scientific explanations (Russ, Lee, & Sherin, 2012). As an alternative to self-reports, researchers who study the contextual activation of epistemic resources consider it more appropriate to observe participants’ communicative moves as well as other behavioral and paralinguistic features during learning activities (Scherr & Hammer, 2009). In studying these episodes, researchers rely on the observation of mutually reinforcing behaviors that reveal participants’ underlying sense of what the activity is calling for with respect to knowledge (Scherr & Hammer, 2009). The assumption is that not only are these clusters of behaviors interpretable to participants themselves but also to the eye of the researcher that observes such interactions (Jordan & Henderson, 1995). Conversation and interaction analyses are the number one methodological strategy because researchers indirectly infer students’ understanding of the interaction. This assumption is framed within a belief that knowledge and learning are fundamentally based in social practices and material ecologies (Lave & Wenger, 1991). For instance, Russ et al. (2012) selected a set of verbal, non-verbal, and prosodic behaviors that were useful in determining student epistemic framing. In particular, they pay attention to students’ body (forward or backwards), their gaze (at the interviewer or not), their gesturing (prolific or not), their hedging language (“Mmm, I don’t know”), and the clarity of their speech (tone and speed). With this method, Russ et al. (2012) found that in the course of cognitive interviews, students tend to switch back and forth between at least three different kinds of epistemic frames, which may be more or less productive in terms of the quality and sophistication of their scientific explanations. Students may take on an inquiry frame, in which responses are sense-making elaborations; or an expert frame, in which students talk in a way that reflects what they know as a personal construction; or interpret the situation as an oral examination where they are expected to produce ‘correct’, wordy answers. Traditionally, researchers code the corpus of video data holistically in search for transitions between clusters. These clusters are eye-balled from an aggregate of participants’ verbal and non-verbal behaviors (Scherr & Hammer, 2009). When transitions are identified, researchers label in the segments between transitions with more or less meaningful codes, such as a color (e.g., green, red, blue) or with a direct interpretation of the frame (e.g., inquiry, expert, or exam). These frames are contextually dependent because they reflect the students’ epistemologies in the context of reasoning within particular activities. While the traditional approach to capturing epistemic frames has been fruitful, it also introduces some limitations. The coding of video recordings is time consuming and always has potential for errors as indicated by inter-rater agreement estimates. In other cases, coding might not be the best strategy to find crucial moments of interaction in a large corpus of video data. We wonder whether a more fine-grain analysis of video data could be useful in identifying and representing patterns of actions and interactions during learning episodes that might further develop our theoretical understanding of learning. Below we discuss how our new analytic approach is at the intersection of theories of learning as processes and practices and recent developments in learning analytic tools that could be useful in understanding such processes. Then, we introduce some examples from our own research to illustrate how patterns of epistemic frames may be captured and represented. Learning Analytics and Video Data Many learning scientists use video recording as their source of data and employ a wide range of methodological approaches to analyzing these video data to understand how people learn in various contexts (Derry et al., 2010). Discourse analysis, interaction analysis, and ethnomethodology are methodological pillars in the learning sciences, and these analytical perspectives often rely heavily on video data and allow researchers to understand the moment-by-moment development of participants’ activities and communicative moves (often associated with interaction analysis, Jordan & Henderson, 1995). New techniques for the analysis of qualitative data are currently of growing interest in both learning sciences and learning analytic communities (Martin & Sherin, 2013). These analytical methods serve as complementary techniques to prior qualitative approaches. Specifically, classification techniques, such as clustering and multidimensional scaling, as well as social network analysis, are gaining renewed traction. Most often, however, these analytics are applied to textual data, such as transcripts, online forums, chats, and computer logs (cf., Berland, Martin, Benton, Petrick Smith, & Davis, 2013; Blikstein, 2011; Sherin, 2013) but have not been used in the context of video-based interaction analysis. Our goal is to explore the use of these analytical methods in the context of methodologies such as interaction analysis hoping to increase the result consistency and provide convergent support for the interaction analyst (Sherin, 2013). In coding qualitative data, researchers pay attention to certain features that are relevant to interpreting and understanding participants’ practices and learning processes. Quantitative analysis of a large sample of these qualitatively coded observations may highlight relationships in the data that would provide researchers with empirical evidence of the hidden underlying patterns of interaction practices within and across participants. The ultimate purpose in this paper is to explore new ways to capture and represent epistemic frames and their constitutive elements (i.e., observable behaviors), as well as to illustrate the possible usefulness and theoretical implications of this new analytical approach. We propose a fine-grain video analysis and the use of large samples of observations of data points that would open the possibility to analyze patterns in the data that would not be visible otherwise. The goal of this approach is to produce a systematic and consistent analysis of the moment-by-moment interactions that goes beyond eyeballing a subsample of episodes across individual cases. In what follows, we discuss our fine-grain video-based analysis by introducing our own research in interviewing first graders about complex systems thinking. Fifteen ~10-minute individual interviews of first and second graders were subjected to analysis. These interviews were designed as a posttest after a short instructional intervention to teach children about complex-systems thinking in the context of bees collecting nectar (Danish, Saleh, & Andrade-Lotero, 2014). Each child answered nine questions about the behavior of bees and ants gathering food. The questions were asked with the support of pictures and a Netlogo animation (Wilensky, 1999). A video-based analytic approach We started our fine-grain analysis by producing two layers of data, a high-level ‘holistic’ set and a low-level ‘analytic’ set. It is important to notice that both data sets are based on the relevant features and epistemic frames elaborated on Russ et al.’s (2012) paper. The high-level set (see table 1) contains time stamps for transition moments along with labels for the frames. On the other hand, the low-level set (see an excerpt in table 2), contains information per every 10-second episode with respect to five variables, that is, the five student behavioral features. Table 1: High-level epistemic episodes data set. Time from [02:43.09] [02:46.03] [02:48.07] Frame Oral Exam Inquiry Oral Exam Table 2: Low-level behavioral features data set. Subject ID Time Hedging Speech [03:12.14] Inquiry [03:15.08] Oral Exam Gesture Body Gaze 17 00:49.3 Quiet Silent No gesture Not engaged No eye contact 17 00:59.3 Quiet Silent No gesture Engaged Eye contact 17 01:09.3 No hedging Soft No gesture Not engaged No eye contact [03:32.05] Inquiry We chose an arbitrary 10-sec interval based upon previous work (Andrade Lotero, Danish, Moreno, & Perez, 2013), but we noticed that some behaviors are more or less stable across time than others. We suggest that this interval can change depending on the detail of interest. For instance, body position tends to remain more stable across longer periods of time, whereas gaze tends to vary frequently and rapidly. Because choosing meaningful features is a crucial task in interpreting participants’ communicative moves, it is important to pay attention to how these features change in relation to each other and, therefore, carry more or less information. Specifically, frames and behavioral features are contextually dependent, and the relative importance of a feature can be related to how much information it provides with respect to the other features and epistemic frames. To illustrate this point, for instance, think of a body position that only changes in five-minute intervals whereas epistemic frames may vary every 30 seconds. Conversely, if gaze changes rapidly in comparison to the epistemic frame, the stable frames perhaps vary independently of gaze or body position. On the other hand, because the length of the epistemic frames is unknown beforehand, a wide range of behavioral features would serve as a good initial approximation and, in turn, should inform the length of the interval used to code the data. We have been developing three possibilities for an analysis that starts with this two-layered data set: (a) visualizations, (b) stochastic models, and (c) automated classifier. In what follows, and because of the limitations of space in this short paper, we only elaborate on the first possibility, visualizations. Visualizations Learning analytics are great tools for creating visualizations (Duval, 2011). In particular, researchers can find patterns of behaviors that may be clustering with epistemic frames, and from here they can make decisions about further analyses or interpretations. We have developed two visualizations in the form of a descriptive categorical time series analysis and a correspondence analysis. Serial co-occurrence of features A categorical line plot displays the serial development of the 10-second episodes across levels of features by visually revealing the change in each. Evidence of behavioral co-occurrence is found in levels simultaneously switching across features. For instance, figure 1 shows an example for one participant along her 10-min interview. The levels of each of the five features are plotted along the Y-axis whereas feature levels are plotted along the X-axis. Lines going up and down do not represent more or less proportion of levels; these differences only reflect a change on the levels within each feature. The dashed vertical line on the left-hand side of the figure marks the fifth 10-sec episode. It is apparent that there was a change in the body position, and that it anticipated a change on the other features during the 10-sec episode that followed. Instances of these crossfeature changes can be seen along the rest of the interview. Figure 1: An example of a time-series line plot for a particular subject. Vector space displaying co-occurrence of features and frames A correspondence analysis provides chi-squared distances in contingency tables from which one can obtain spatial coordinates that discriminate among categories, also called ‘inertias.’ By using differences between expected and observed frequencies across combination of features and epistemic frames, one can plot the relative position of each level on a high-dimensional space. This analysis was performed including 7 cases to ensure that sufficient information was provided across all combination of features and frames. In figure 2, we present the relative distances for each level across all five features with respect to the three types of frames. To produce this plot, several features were dichotomized with the aim of avoiding empty cells and creating at most two dimensions that separate among categories. This visualization shows how particular levels of behaviors cluster together with types of frames, an original finding from our research. For instance, making eye contact with the interviewer, use of hedging language, and speaking softly tend to co-occur with the inquiry frame. Prolific gesturing, no hedging language, and speaking loud tend to co-occur with the expert frame. Fidgeting, quiet or neutral speech and lack of eye contact tend to co-occur with the exam frame. On the other hand, it seems that body position (forward or backwards), along with no gesturing, are not clearly co-occurring with any type of frame. Figure 2: Correspondence analysis between frames and features. Future work In further steps, we are building stochastic models that can assess the relative contribution of each feature in representing the epistemic frames. For instance, we can quantify the strength of the association between features and type of frames. We can also build a model in which frames can be represented as a linear combination of features. Finally, building off of this model, we can use it to identify other episodes in the rest of the data corpus. The model, thus, becomes a ‘training’ set from which one can automate a classifier to classify the part of the data set that had not been subjected to traditional frame coding. Conclusion Developing analytical tools for coded qualitative observations is of increasing interest to researchers in the learning sciences. Although some work has been done in research areas such as textual records from posts and logs in online participation, video-research, on the other hand, has received less attention. We have presented here examples of new analytical tools for the study of learning practices from video-based interaction analysis. One first consequence of being able to conduct such fine-grain video analysis is that it enables the researcher to analyze patterns in large samples of observations or coded data that would not be possible otherwise. It allows researchers to display visualizations of the moment-by-moment interaction across features of interest and across the whole data corpus. It also makes stochastic models available from which one can assess the relative contribution of each feature in representing the practices of interest. Another possible benefit is that these models could be used to identify other episodes in the rest of the data corpus that have not been manually coded. However, much work remains. For instance, current technology is still not sufficiently fine-tuned to provide automated capturing of any given features. We are confident that in a near future such procedures will become a common practice for researchers interested in video-based interaction analysis References Andrade Lotero, L. A., Danish, J. A., Moreno, J., & Perez, L. (2013). Measuring ‘Framing’ Differences of Single-Mouse and Tangible Inputs on Patterns of Collaborative Learning. Paper presented at the CSCL, Madison, Wisconsin. Berland, M., Martin, T., Benton, T., Petrick Smith, C., & Davis, D. (2013). Using learning analytics to understand the learning pathways of novice programmers. Journal of the Learning Sciences, 22(4), 564-599. Blikstein, P. (2011). Using learning analytics to assess students' behavior in open-ended programming tasks. Paper presented at the Proceedings of the 1st international conference on learning analytics and knowledge. Danish, J. A., Saleh, A., & Andrade-Lotero, L. A. (2014). Software Scaffolds for Supporting Teacher-Led Inquiry into Complex Systems Concepts. Paper presented at the AERA annual meeting, Philadelphia, PA. Derry, S. J., Pea, R. D., Barron, B., Engle, R. A., Erickson, F., Goldman, R., . . . Sherin, M. G. (2010). Conducting video research in the learning sciences: Guidance on selection, analysis, technology, and ethics. The Journal of the Learning Sciences, 19(1), 3-53. Duval, E. (2011). Attention please!: learning analytics for visualization and recommendation. Paper presented at the Proceedings of the 1st International Conference on Learning Analytics and Knowledge. Jordan, B., & Henderson, A. (1995). Interaction analysis: Foundations and practice. The Journal of the Learning Sciences, 4(1), 39-103. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation: Cambridge Univ Pr. Martin, T., & Sherin, B. (2013). Learning Analytics and Computational Techniques for Detecting and Evaluating Patterns in Learning: An Introduction to the Special Issue. Journal of the Learning Sciences, 22(4), 511-520. Redish, E. F. (2004). A theoretical framework for physics education research: Modeling student thinking. arXiv preprint physics/0411149. Russ, R. S., Lee, V. R., & Sherin, B. L. (2012). Framing in cognitive clinical interviews about intuitive science knowledge: Dynamic student understandings of the discourse interaction. Science Education, 96(4), 573-599. Scherr, R. E., & Hammer, D. (2009). Student behavior and epistemological framing: Examples from collaborative active-learning activities in physics. Cognition and Instruction, 27(2), 147-174. Sherin, B. (2013). A computational study of commonsense science: An exploration in the automated analysis of clinical interview data. Journal of the Learning Sciences, 22(4), 600-638. Wilensky, U. (1999). NetLogo: Center for connected learning and computer-based modeling. Northwestern University. Acknowledgments We are grateful to Asmalina Saleh, Jacke McWilliams, and Branden Bryan for their contributions in various steps along the process.