Understanding and predicting the behaviors of teachers using an online curriculum planning tool Ogheneovo Dibie, Institute of Cognitive Science and Department of Computer Science, University of Colorado Boulder, 1777 Exposition Drive Boulder, CO 80309, ogheneovo.dibie@colorado.edu Abstract: A key interest of the learning sciences and analytics community is in developing approaches to understanding the practices of teachers using educational technologies. This knowledge is particularly useful in improving the adoption and diffusion of these technologies. This paper presents a computational approach to understanding and predicting the behaviors of Earth science teachers using a web based curriculum planning tool. It illustrates a methodology for characterizing user behaviors, understanding usage trends and frequent patterns of use, and predicting long term behaviors of teachers from small time windows. Introduction From supporting educators in the classroom, to providing students with learning opportunities outside the classroom, educational technologies play an important role in learning today (Brusilovsky, Farzan, & Ahn, 2005, Sumner & CCS Team, 2010). Understanding the complex work that constitutes teaching is an important area of research in the Learning Sciences (Fishman & Davis, 2006). As much of the processes involved in teaching moves online, it is necessary to develop computational methods that improve our understanding of teaching practices within online environments. This research demonstrates a computational approach to understanding and predicting teacher behavior in an online curriculum planning tool based on clickstream data. It builds upon research on computational approaches to understanding the use of educational digital resources. One such approach is the work of Maull, Saldivar, & Sumner (2011) where a computational approach for understanding teacher use of a web-based curriculum planning tool is investigated. They generated a typology of user behaviors by clustering the clickstream of users. A user typology refers to the categorization of aggregate user behaviors into distinct user types (Brandtzaeg, 2010). Each user type gives an overview of the manner of use of members of that type (Brandtzaeg, 2010). For example a Facebook “power user” could be an individual who spends a lot of time on Facebook and uses it in a variety of contexts such as gaming, sharing video, and instant messaging among others. While current usage theories give insight into user types, they do not account for the evolution of such behavior and practices. A user of the type power user (spends a lot of time on a product and exercises most of its features), after a year of using a product may not have exhibited this behavior from first use. This research aims at shedding light on the evolution of a user type by understanding how the patterns of use that describe it change over time and what frequent patterns can be observed within it. Furthermore, this research introduces computational models for predicting a user's behavior based on knowledge of a user typology. This could be especially useful in assisting platform providers in supporting the needs of users based on predicted behaviors. This support could be in the form of better professional development (PD) training or improved system design/usability. Research Questions The following questions will be addressed in this research: 1. Does the usage pattern (in terms of usage feature values) which describe a user type remain the same or does it vary and if so how? 2. What are the frequent patterns of use that can be observed within a user type? 3. How well can computational models predict a user's eventual user type from smaller time windows? Question one examines how the value of usage features that describe a user type change from time to time. Question two describes what usage features go together within a particular user type. It would provide a better understanding of what users of a particular type are likely doing when doing something else. Question three, a set of classifiers for predicting the user type of users at the end of a school year from earlier time frames such as the first month of use and first semester of use is explored. The aim is to discover what time frame and usage features are best predictive of a user's user type. Research Context The research proposed in this paper is based on teacher use of a web based curriculum planning tool called the Curriculum Customization Service (CCS). The CCS provides middle and high school Earth Science teachers with access to digital versions of their instructional text book (publisher material), curriculum-relevant high quality digital library resources (interactive resources) and community contributed resources (my stuff & shared stuff) (Sumner & CCS Team, 2010). Figure 1: Snapshot of the CCS indicating its core components. Interactive Resource (open tab), Publisher materials (Earth Comm Activities tab), Community resources (Shared stuff and My Stuff tab) Background This research draws upon theoretical models and computational approaches of determining technology use. Theoretical Models of Technology Use Technology use research follows from a rich history of work in technology adoption. Technology adoption occurs when an individual decides that a technology is of value and decides to use it. Technology adoption theories include Roger's innovation diffusion theory (Rogers, 1995), the concerns based adoption model (Fuller, 1975), and the technology acceptance model (Legris, Ingham, & Collerette, 2003). A key theory on technology use is the theory of use-diffusion. Originally proposed by Ram and Jung (Ram & Jung, 1990) and expanded on by Shih & Venkatesh (Shih & Venkatesh, 2004), it measures technology use on two dimensions: frequency and variety. Usage frequency refers to how often a product is being used while variety refers to the different applications or contexts within which a product is being used. . Computational Approaches for Determining Technology Use Unlike the works of Ram & Jung and Shih & Venkatesh which analyzed usage from self-reported data, computational approaches have been investigated to generate a typology of user behavior based on clustering of actual usage. Clustering is a data mining technique for automatically grouping related items into bins (Han & Kamber, 2006). Xu (2010) examined the use of clustering techniques to generate fine grained user typologies within a web based instructional tool known as the Instructional Architect (IA). The IA is an educational digital library service designed to facilitate the creation of simple instructional projects using web resources from the National Science Digital Library (NSDL) and the web in general (ibid). Working with the Curriculum Customization Service (CCS), an online curriculum planning tool incorporating digital library resources, Maull et al. (2011) developed a typology of user behaviors observed in the CCS. This typology is inspired by a usediffusion based methodology and characterizes use based on frequency and variety type metrics observable through server usage logs. Frequency based metrics include the number of sessions and hours spent; while variety based metrics include areas of the CCS that were accessed such as interactive resources, publisher materials, and user-contributed resources. This research extends the use-diffusion methodology used by Maull et al. (2011) for understanding technology use in the following ways: (1) This work provides an understanding of how usage features that characterize user types trend within the time period of the observed user type. A user type is defined by the values of a set of usage features (described in research context section). (2) This work introduces a marketbasket analysis1 of usage features to understand what features are likely used in concert per user type. This answers the question of when users of type Y are doing X, what else are they likely to also be doing. (3) This works explores computational models to predict an educator's behavior from small time windows of usage data. It aims to provide understanding of the earliest time window of usage data and usage . Research Design 1 Market basket analysis is a data mining technique for understanding consumer behavior through their transaction patterns (Han & Kamber, 2006). It seeks to uncover meaningful and interesting associations in customer purchase data. A canonical example of the use of marketbasket analysis is a grocery store analyzing consumer purchases to understand what items shoppers usually purchased together. The analyses conducted in this research are based on the use of CCS by 80 Earth science teachers during the 2011-2012 school year. Usage is analyzed through the lens of six usage features that cover the core areas of the CCS. Following the use-diffusion methodology, these features are frequency based: number of sessions, hours spent; and variety based: publisher material activity, interactive resource activity, my stuff activity and shared stuff activity. Before proceeding with each analyses performed I introduce the idea of usage feature discretization which informs the analysis performed. Usage Feature Discretization Each usage feature was discretized into three equal frequency bins of high, medium and low. Given the relatively small size of the dataset, binning features allowed us to manage the wide distribution of continuous feature values in a consistent fashion. To illustrate this consider the set of usage features {Number of sessions, Hours spent, Interactive Resource Activity, My Stuff Activity, Shared Stuff Activity, Publisher Material Activity} with values {20,4,90,23,45,55} that define a user's aggregate usage at the end of an observation period. The feature set here indicates that the user spent 4 hours on the platform across 20 sessions and performed 90, 23, 45 and 55 click actions within the interactive resources, my stuff, shared stuff and publisher material areas of the system respectively. Discretized feature values such as {high,high,mid,low,high} gives a sense of a user’s usage in comparison to everyone else. Discretized feature values inform our characterization of clusters generated from clustering usage, trend analysis of usage features, understanding of frequent patterns of use and prediction of user types. Characterizing user types Following a similar clustering approach as Maull et al. (2011), a user typology was generated via clustering the clickstream of all users. The discretized feature values of each bucket were mapped to user types discovered by Maull et al. (2011) to generate a label. If a direct mapping did not exist, a new user type was generated. Study 1: Detecting usage pattern trends This study addresses the first research questioned. As noted earlier, a user type is described by the values of its set of usage features. The aim of this analysis is to understand how the mean values of each usage feature per user types changes from time to time. To do this, each usage feature is observed on a semester-semester, bimonthly and month to month basis. This would give a good idea of which usage features remain stable (in terms of use) and which vary per user type. This would be particularly useful in detecting strong features that define a user type, as features that remain relatively stable could be good markers of a user type. Study 2: Frequent patterns of use This study addresses the second research question. The aim of this study is to understand what actions of members of a specific user type usually go together. As an example, this study addresses the question of what members of the power user user type are likely doing when they spend a high amount of time on publisher materials. This would give a good idea of what usage feature correlations occur within a user type. I take a market-basket analysis approach to understand what frequent patterns occur within each user type by considering the feature values per month for each user as a 6-item transaction. Study 3: Predicting user types This study addresses the third research question posed. It examines the use of machine learning classifiers for predicting a user’s behavior from small windows of time. The aim of this study is two-fold: Determine the earliest window of time that provides the best prediction of a user’s eventual class. A user’s eventual class is the user type they hold membership in at the end of the year Determine the usage feature(s) that are most predictive of a user’s eventual type. Results Characterizing User Types The clustering analysis performed produced three distinct user types that roughly map to user types discovered by Maull et al. They are: power users, community seekers, and limited use user types. The power users had a high value in each of the usage features examined. The community seekers used the CCS at a lower intensity compared to users in the power user category, however they have higher values in the shared stuff and my stuff areas. The limited use user type featured a low value in all features. Many users in this category either used the CCS platform sporadically or discontinued use after just a month or two. Usage Pattern Trends Power users: All usage features had a high value on a semester-semester and bi-monthly basis. However, when analyzing the usage features on a month to month basis, interactive resource activity, my stuff activity and shared stuff activity tended to vary between the mid-high value bins. This indicates that a high value in the number of sessions, hours and publisher material areas are a good indicator of a power user. Community seekers: From performing semester-semester, bimonthly and month-month analysis of usage features of users in this user type, it was discovered that although all usage features tend to oscillate between the low and mid value ranges, shared stuff activity, publisher activity and my stuff activity tended to be the most stable usage features Limited Use: While users in this category generally tend to have a low value of use in all areas, on a month to month basis usage feature values tend to vary between the low and mid value bins. Frequent Patterns of Use As a reminder, the goal of this study is to understand how usage features are used in concert. It aims to answer the question of when a particular feature is used to a certain degree, what are users likely to also be doing. Like in the usage pattern trend analysis, frequent patterns were analyzed on a user type basis. I discuss the most interesting associations discovered from the analysis performed below. Power user type: On average, users in this category have a high value in all usage features. Considering the monthly usage of each user as an item set in the transaction, it was discovered that when users exhibit a high frequency of use (high value in hours spent and number of session features), they were likely spending most of their time accessing publisher materials which also had a value of high. Community Seeker user type: Users in this group are likely to engage in using community resources compared to the other groups. However, when they have a low frequency of use, they are more likely to be accessing community resources. Limited use: Users in this group featured a low use of the CCS on average. No particularly interesting association rules were discovered, all rules tend to reinforce the fact that when users have a low use of the system, they are also likely to have a low use of other system areas. Predicting User Behavior In predicting user behavior, a set of machine learning classifiers was implemented on a semester, bi-monthly and monthly time windows. The months of March-May were the most predictive of a user’s type with a prediction accuracy of 87.5%. However, this prediction isn’t very useful as it at the tail end of the school year. October was found to be the earliest month were a fairly good prediction of 72.5% could be made. Furthermore, the number of sessions, shared stuff activity and publisher material activity were the most predictive features for that month Discussion & Conclusion This paper introduced a methodology for characterizing user types, detecting trends in usage via changes in usage feature values, understanding frequent patterns of use and predicting long-term usage behavior as described by user types. This work uses clickstream data to capture teacher behaviors, getting at what they do in an online system, but not at why (a challenge for learning analytics research). Future work will examine a different metric for usage time. This work uses months of use. An alternative could be to consider the number of clicks as a usage time. Thus instead of considering usage at the end of a school year, analysis could be carried out on the first 1000 clicks of each user. References Brusilovsky, P., Farzan, R., & Ahn, J.-w. (2005). Comprehensive Personalized Informationa Access in an Educational Digital Library. Joint Conference on Digital Libraries, 9-18. Fishman, B., & Davis, E. A. (2006). Teacher learning research and the learning sciences. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (pp. 535-550). New York: Cambridge University Press. Fuller, F. F. (1975). Concerns of teachers: A developmental conceptualization. Teaching Effectiveness: Its Meaning, Assessment, and Improvement, 175. Gerardo, M. S. (2012). Teacher Adoption of a web-based instruc-tional planning system. Ph.D. dissertation. Boulder: University of Colorado Boulder. Legris, P., Ingham, J., & Collerette, P. (2003). Why do people use information technology? A critical review of the technology acceptance model. Information & management, 191-204. Maull, K. E., Saldivar, M. G., & Sumner, T. (2011). Understanding digital library adoption: a use diffusion approach. Joint Conference on Digital Libraries. Ottawa. Ram, S., & Jung, H.-S. (1990). Journal of the Academy of Marketing Science, 18(1), 67-76. Rogers, E. M. (1995). Diffusion of innovations. Simon and Schuster. Shih, C.-F., & Venkatesh, A. (2004). Beyond adoption: development and application of a use-diffusion model. Journal of Marketing, 68(1), 59-72. Sumner, T., & CCS Team. (2010). Customizing science instruction with educational digital libraries. Joint Conference on Digital libraries. New York. Xu, B. (2010). Understanding Teacher Use of a Digital Library Service: A Clustering Approach. Logan, Utah: Utah State University.