Teaching and Learning Research Design Guide: First Edition Version Date: August 6, 2015 Authors: Gregory Hum PhD Candidate, Faculty of Education, Simon Fraser University ghum@sfu.ca Jack Davis PhD Candidate, Department of Statistics and Actual Sciences, Simon Fraser University jackd@sfu.ca Creative commons license This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/. Suggested citation (APA) Hum, G. and Davis, J. (2015). Teaching and Learning Research Design Guide: First Edition. Retrieved from http://www.researchprism.com/roadmap/tlr_guide_1st_edition.pdf Page 1 of 35 Author’s notes This document originates from our work at the Institute for the Study for Teaching and Learning in the Disciplines (http://www.sfu.ca/istld.html) as a means of documenting what we have learned in the course of advising Teaching and Learning projects which were a part of the Simon Fraser University Teaching and Learning Development Grants (http://www.sfu.ca/tlgrants.html). As this original informal documentation grew, we found that there was significant interest in this more formal documentation, both from our immediate colleagues as well as others with whom we discussed our work. Acknowledgements The further development of this document was supported in part by funding through the Institute for the Study for Teaching and Learning in the Disciplines and we would also like to acknowledge and thank those who have provided feedback: Cheryl Amundsen, Lannie Kanevsky, Cindy Xin, Andrew Wylie, Angela McLean and Veronica Hotton. This work continues through the following efforts: Teaching and Learning Research Roadmap: A practical guide to conducting applied research on learning and instruction (Gregory Hum, Jack Davis) This documentation continues and expands the original work into a broader conceptualisation of project designs as well as linking design with analytic procedures through revised documentation and an interactive website. www.researchprism.com/roadmap/ This interactive website is a supplement to the aforementioned documentation and aims to help researchers in a variety of contexts design projects to study learning and instruction. Teaching and Learning Research handout (Institute for the Study for Teaching and Learning in the Disciplines) This document is an adaptation of the work described here. Its content is tailored for, and will continue to be developed by, the members of the Institute for the Study of Teaching and Learning in the Disciplines to support the Teaching and Learning Development Grants. Page 2 of 35 Introduction The overall goal of this document/work is to help you approach questions related to teaching and learning, with a focus on evaluating student learning. This document is primarily aimed at those who are already experienced in research, but may not have experience in the issues and methods around teaching and learning research (TLR). The goal is to provide a user-friendly guide that can help you relate and connect your existing research and disciplinary knowledge with research and disciplinary knowledge for TLR. This guide provides a range of possible methods, data sources, and analyses to help you consider how your existing knowledge might be applied to TLR questions, as well as some potential new options to consider. The goal of this guide is to help you make informed and practical choices about the design of your project to ensure you get helpful and practical results to inform your own work and to share with others. This document is informed both by our own experiences with this work, as well as the documented experiences of previous Teaching and Learning Development grant recipients and where possible, we have provided examples from these projects. This is an ongoing work and we are always interested to expand on this and would be happy and interested to hear about your experiences and insights with this documentation. Our contact information is found at the beginning of this documentation. Page 3 of 35 General Guidelines There are two overlapping purposes for Teaching and Learning Research (TLR). You should consider your relative interests/emphases for each research question and your project as a whole. Exploratory purpose questions are open and detailed. These are questions pertaining to how or why something worked with few prior assumptions or guesses, allowing for “emergent” findings. For example: What worked or did not work about the discussion activity designed? What might be helpful or surprising to know if someone else were adopting this activity or what would you change next time you did it? This is most often associated with formative evaluation and qualitative research. This type of research rarely aims for generalisability. Instead, findings from exploratory research speak primarily to your particular course or situation. Even without full generalisability, in-depth information about a particular subject of study can nonetheless be informative for others to consider and/or adapt to their own work/contexts, particularly if their own work or contexts share similarities with your own descriptions. Testing by contrast is more narrow; testing aims to determine if something “worked”, often by comparison (e.g., did students who did a discussion activity learn more than those who only had a lecture?) or determining if a pre-established hypothesis or idea was correct (e.g., watching videos increase student learning). This purpose is most associated with summative evaluation and quantitative research. Testing research is most effective after substantial prior information has been gathered so that the most useful questions are being asked. A testing project may emerge and be informed from the results of an exploratory project. The ultimate goal for some with this intent is “generalisability” that is the ability to claim your findings apply to groups beyond those whom you studied (e.g., math students in other universities, all students in Canada, all instructors in the world). In practice, generalisability is often difficult to realise within teaching and learning research projects due to constraints discussed in this document. Many projects will use some combination of both the testing and exploratory approaches but the relative emphasis may differ based on the project’s focus, your intentions, and practical constraints. While it is possible to have an exclusively testing oriented project, it is recommended that you incorporate an exploratory element into any testing project to help provide detail to your findings. Page 4 of 35 General challenges to consider: In most cases, you will be conducting research on a classroom. Much research, especially quantitative research and especially controlled experimentation emphasises controlled and simplified contexts/treatment. However classrooms are complex and “natural” environments, not laboratories where goals and tasks can be simplified and conditions well-controlled. The following are just some ways that “natural” learning environments differ from controlled laboratory research. You should keep in mind how the following may influence/bias your results and how you might account for them in your design and/or interpretations. • • • • Individual students and class cohorts (e.g., year to year or by terms) can differ radically. Some have more or less prior knowledge and/or motivation for example. Instructors heavily influence teaching and learning. You may influence the results through your own expectations or changed level of motivation. Different instructors also will invariably teach differently in subtle and not-so-subtle ways that can influence outcomes (e.g., are different TAs teaching different tutorial sections? In what ways are the TAs and the tutorial sections different?) The classroom is not a laboratory. Many things influence teaching and learning in the classroom, most which are not easily accounted for or controlled for (e.g., time of the course, other courses taken in the same semester, lighting, student cohort differences etc.). To learn more you can read about the Hawthorne effect, Placebo, or John Henry effect for known versions of these issues. http://en.wikipedia.org/wiki/John_Henry_effect http://en.wikipedia.org/wiki/Hawthorne_effect Note that there are statistical and research designs means of mitigating these effects. Some of these are described further on in this documentation. Choosing data sources Each data source is best for certain research methodologies and purposes and has particular challenges. To inform your selection of the types of data you will collect, think carefully about how each data source contributes to one of your research questions and how/whether the findings will be useful. Think about what the findings might “look like” and what different results might mean and if they are meaningful. Some questions to consider: • How will the data I collect answer/inform my specific questions? Page 5 of 35 • • • • • • What are some ways I expect the findings to “turn out” and what will different results mean? How will the findings support improvement in this course or in other courses? How will the findings influence my approach to teaching and understanding of student learning in this course or other courses? How can the findings inform future decisions and/or designs? Who will read the findings and what would I like them to take away? What data sources have been used in similar projects in the past? Some things to think about: Sample size Some data sources work best with large numbers, especially when the intent is to summarise attributes or study larger patterns. For example, quantitative surveys or tests with the intent of comparing measures between groups. The definition of a “large” sample is subjective but 30 subjects in each group of interest is a good minimum cut-off for their statistical conclusions to be reasonably accurate. Results for these kinds of analyses on small samples can sometimes be deceptive for quantitative analyses. Data sources that work well with small numbers tend to emphasise depth or richness of data. For example, short answer questionnaires, interviews, focus groups, think aloud protocols. From a practical standpoint the transcription that is often required to analyse this data limits the maximum number of participants that can be properly analysed. Evaluating student learning Grades alone will usually be inadequate to assess student learning. Grades are one measure of this, but they do not collect all information about students’ thinking and the nature of the assignment and/or grading criteria will deeply affect what this measure means. To address this, make sure the grading is meaningful and gather additional information if necessary (a detailed marking rubric can provide more detail information). Take time at the beginning of your project to consider what learning outcomes you are hoping to produce, and how these outcomes would be assessed in your students. You may want to collect additional information on students’ thinking by doing interviews or having them do specialised activities like concept maps. In assessing student learning, keep in mind that students will inevitably learn something regardless of the course or program so this information alone is not helpful. Having a Page 6 of 35 basis for comparison for testing (learn better than who or what?) or an exploratory component is important. Practical considerations and limitations Think about the practical issue of whether you possess or have access to the knowledge and/or resources to make sense of the data you choose to collect. Can you reasonably collect this data given constraints and likely participation rates (e.g., trying to find students from past semesters can be very challenging)? While it is a good idea to collect multiple data sources to capitalise on their relative strengths, and ability to access different kinds of information, try to avoid collecting more data and kinds of data than needed to answer the questions you pose. It can be tempting to collect as many kinds of data as you can think of, but you will likely run out of time and resources to analyse them all and asking too much of participants can negatively affect the quality of their responses or participation (e.g., each subsequent survey in a semester is likely to have lower and lower participation). Page 7 of 35 Data sources A brief overview of some of the major data sources/methods follows. Each data source has relative pros and cons. You will likely want to select more than one for your project and/or each research question. Personal observations (reflections, notes) These are easy to collect and your thoughts and anecdotes can give useful detail and insight. This is good for exploratory projects. However, this kind of data is fundamentally subjective, and thus not appropriate for testing. It also provides no information on students’ perspectives and thus shouldn’t be your primarily data source. This data source shouldn’t be used exclusively. Pros: easy to collect, may have deep contextualised insights Cons: subjective, limited perspective Student performance/thinking (grades, projects, concept maps) Grades are always collected as part of a course and are easily understood by most. This is a common source of data for testing projects. However, grades can mean fundamentally different things based on the activity being evaluated and the evaluation method itself. They also provide little detailed information on student learning or thinking. For instance, asking a student to answer fact based questions on a quiz assesses a different level of understanding than asking a student to complete an “authentic” task (e.g., write a computer program; write a short a story) You can design specific assignments and/or activities to assess student thinking. Concept mapping is one such method. Another would be a complex project with a detailed grading rubric. In either case, consider what kind of learning the task students are being asked to do demonstrates (is it what they know? is it what they can do? is this task like a “real-world” task?). This kind of more detailed analysis of student thinking can help serve either testing or exploratory purposes. However, this is more time consuming and difficult to analyse in general than “grades” alone. In all cases, it should be noted that student performance and thinking can be difficult to attribute to a specific course or intervention since students have individual differences Page 8 of 35 (e.g., prior knowledge, time). This issue should be considered when designing your study. In general, simpler, numerical measures such as grades work well for testing purposes whereas more complex evaluations such as detailed evaluations of concept maps or projects work well for exploratory purposes. Pros: may already be part of a course, easy to analyse statistically, many purposes can be served or assessed by changing the evaluation Cons: grades do not tell the whole story about student thinking/performance, can be challenging to create an accurate measure, can be difficult to separate from individual differences Surveys and survey responses (quantitative rating surveys and/or short answer surveys) Surveys are a flexible tool which can quickly gather a large amount of data and be quickly analysed. Many tend to collect relatively undetailed information through simple response items so are best for a testing purpose, however, focusing instead on short answer questions can support an exploratory purpose (a combination is of course possible as well). Surveys have many uses including gathering general attitudes/opinions, comparing responses between subgroups, and looking at the relative average responses for individual questions. Put another way, surveys can be used to collect a variety of data. Quantitative data such as rating scales work best with relatively large sample sizes and having small samples can make numerical results less reliable. Qualitative data such as short written answers can work with smaller samples since more detailed and unconstrained information can be gained from each response. It is of course possible to qualitatively survey large populations, but analysing the data may be very time consuming. Survey research designs may include single surveys, pre-post designs (two different time points to assess change), post-pre designs (one time point but asking for retrospective responses for the “pre” component), multiple longitudinal surveys (more than two different time points) Some things to note: Low response rates are the norm (e.g., 20-30%). Page 9 of 35 Answered surveys may not necessarily provide quality information as some people will answer without adequate consideration of each question (e.g., answering the middle response throughout). It is important to look for this issue at analysis. While pre-post designs are a commonly considered design, in practice they can be challenging to implement and get meaningful results from. o Firstly, these designs works best when individual students can be linked from their pre-responses to their post-responses. Attrition and maintenance of confidentiality are thus common issues. o A second common difficulty is a shift in how individuals interpret questions when answering for pre and post. For instance, it is common for people to self-rate their knowledge on a topic as lower after learning more about it as they now know what they did not know! o A post-pre survey design (where respondents are surveyed only once but asked what their “pre” response would have been on each question) can help address this issue. Validated instruments are another approach. Typically, “best practice”, especially for quantitative research designs, requires a “validated” instrument whose questions have been statistically tested with many participants and refined over time. However, this is extremely time consuming and out of the scope of most teaching and learning research projects. Already validated tests can be found, but can be difficult and/or expensive to gain access to. Also they tend to be relatively abstract in their wording and thus may not work well for specific projects and are not generally recommended for these reasons. Adapting validated instruments is possible, but any modifications/adaptations strictly speaking would require a re-validation process. Thus, while validation is ideal for some purposes, for most TLR projects, a carefully designed, non-validated survey is the recommended approach. Survey support and guides are available to Teaching and Learning Development grant recipients. Another survey research design is to have multiple (generally 3 or more) surveys over time to track changes of individuals’ responses over time. Note the concerns with pre-post designs also apply and attrition is an even larger danger. This design, however, has the potential to create very in-depth results not possible under other designs such as how opinions and/or learning change over time. Qualitative surveys or survey questions are generally less complex. Analysis, however may be difficult and time consuming depending on the nature of the responses received. Pros: collect many kinds of information quickly and efficiently, quantitative data collected this way is easy and quick to analyse, qualitative data collected this way allows for diverse and unconstrained responses. Page 10 of 35 Cons: a good survey is difficult to design, quantitative surveys collect undetailed information; quantitative surveys require large samples for reliable findings; low response rates, attrition and low quality responses can make it difficult to interpret results; qualitative surveys can be time consuming to analyse Interviews Interviews can serve a variety of primarily exploratory purposes (some are described below). They are good for collecting highly detailed information on a person’s thinking and perspective. There is of course an inherent subjectivity that should be accounted for when interpreting/analysing them. They can also be time consuming to collect and transcribe. When conducting interviews it is common to construct an interview protocol which outlines the questions to be asked. This is especially important if there are multiple interviewers. Interviews can be structured, semi-structured, or unstructured depending on how much they allow deviation from the protocol to “probe” and whether they use one at all. Semi-structured is arguably the most common form where most questions are asked but probes are common. Some potential uses for surveys: Interviews with students can help gain insight into their course experiences and/or their needs. Interviews with TA’s or instructors can help gain insight into their instructional perspective. Interviews with experts (e.g., subject matter experts, other instructors) can provide a variety of insights and may be particularly helpful at the outset of a project to inform a design. The focus group method is an interview variant that can efficiently access multiple opinions and perspectives. If considering a focus group, also see: Nominal Group Technique or Delphi Technique. Some variants to consider: “Think-aloud protocol” is a methodology common in social psychology and usability testing where individuals describe their thinking as they work through a task. This can help to uncover how people think through problems and/or uncover misconceptions. Page 11 of 35 Pros: collects detailed information that can serve a variety of purposes, can access an individual’s thinking and perspective, multiple interviews can help develop common themes or consensus around issues. Cons: individual perspectives are subjective, interviews are time consuming to collect and transcribe Observations Observations can be useful for getting information on classroom dynamics, instructor or student behaviour etc. This can be adapted to either testing or exploratory purposes. In the former, some sort of structured observation checklist by an “objective” observer is ideal whereas the latter can be done by taking detailed notes. This kind of data can be highly convincing and is considered relatively objective. However, this is by its very nature time consuming and resource intensive as it requires a dedicated observer. It may help and/or be important to have a knowledgeable or trained observer such as an expert in the field. Pros: collects “objective” evidence Cons: time intensive to collect, needs a dedicated observer Trace data and records A large and general category created for this guide that includes course evaluations, attendance, canvas records, clickers, documents This is a highly varied data source and the purposes served will vary depending on what it is. Generally speaking, they do not provide much detail on their own but can be widely collected/collated and can serve exploratory or testing purposes depending on what the data is and how it is used. In some cases these overlap with and/or substitute for student grades/performance (e.g., canvas records or clickers). This data is easy to collect once the method for collecting it is established. Courses with an online component can allow trace data such as page views, video views, and the time of use to be collected automatically. In general, university records such as course evaluations or documents are relatively easy to locate and collect. However, in most cases these will be difficult to repurpose for the specific questions of a particular project. Page 12 of 35 Pros: can be easy to collect large amounts of data Cons: can be difficult to analyse, may need to be adapted for particular purposes Page 13 of 35 Participant selection/recruitment The consideration of who is participating in your research, or who will participate is an important one. While in many cases your choices will be largely determined by the constraints of your project, it is important to consider what the trade-offs and relative pros/cons of different methods are. Some points to consider: Aim to recruit as many and as diverse participants as possible (e.g., through reminders and/or incentives). The more students participate the better your findings will reflect the group as a whole. Whether the number of participants you have is appropriate for the data sources you are using (or vice versa) Who is actually participating in your study and how this might affect/influence results. o What are the attributes of your sample? Are the demographics of your sample somehow unusual relative to the total “population” of your whole class or program (whatever level your study is looking at). o Is your current “population” (e.g., class) the same or different from other potential similar populations (e.g., classes from previous years). Who isn’t participating in your study and what you might not know because of this. o Dropout over time can change who is not participating. Are certain kinds of students more likely to drop out? o Students may not be participating to their full potential or effort. Students might, for instance fill out the survey but answer the same neutral response throughout without reading the questions. o In many research designs, especially in quantitative research, the ideal is that participants are randomly selected from a larger population, all who have an equal chance to be selected. This randomly selected “sample” can then be used to “generalise” to the entire population, even those who did not participate in the study. However, random selection is rarely possible in TLR and you should consider how this more “limited” sample may affect your results. You should consider/examine who is actually participating in your study and how this difference from a larger population of interest might impact results. For instance, if you study your class, how are they the same or different from other students at SFU or in Canada? • Random selection is seldom used in qualitative research but it is still important to consider who your participants are and how their specific attributes may influence your findings. When developing a new method or tool, you want to select students (and perhaps faculty) who will actually use what you are developing or who are “like” those who Page 14 of 35 will be using it (e.g. students in the same course in which you plan to eventually use what you are developing/building. While in the building and designing phase of developing a new method or tool, having a fast turn-around time for feedback is more important than gathering lots of feedback at once. Plan for multiple rounds of feedback, for example in the form of short surveys, in expert interviews, or in focus groups so you can quickly use the feedback to make improvements and correct problems early. If conducting a survey for feedback, convenience sampling (described below) should be sufficient. o Convenience sampling may lead to some students in a group being underrepresented (e.g. EAL students) because of their reluctance to volunteer information. o However, your goal is to find the most pressing issues with the approach, tool or method you are building, if any, and deal with those. Most of these issues will be found by taking a convenience sample. Expert interviews may be helpful to inform your design in terms of content or pedagogical practices or techniques. o For example, you may want to interview another instructor whom has used a similar method, or an industrial professional or content expert in the field. o A technique such as the Delphi technique can be used to aggregate opinions and build consensus. If you want more detailed information, or wish to have a chance to ask further questions, a focus group is recommended. o In a focus group, 4-10 volunteers give feedback together as a panel in a open discussion for a fixed time, usually 30-90 minutes. o To prepare a focus group, have a set of open questions and short discussion activities, such as having your focus group collectively make a mind-map. o Be prepared to offer an honorarium or gift certificate for people's time. o To find subjects for a focus group, purposive sampling is ideal, also shown below. o One potential problem with focus groups is that, because it is an open discussion, certain members of the group may dominate the conversation or drown out dissenting opinions. Try to use question formats and activities that encourage less dominant voices to contribute. Participant selection/recruitment methods Convenience/non-random sampling: Page 15 of 35 “Convenience sampling” is a term for the method of sampling whoever is the most available or convenient to get answers from. In the context of TLR this will often be students in your class who volunteer to participate in our research. In general, in many projects convenience sampling is the default as your choice of sample is obvious or a given. The main alternative is purely random selection, however this is usually not possible for TLR projects (though random grouping may be for details see below). That is, you cannot, especially in a large class, intentionally choose who will participate and your participants are obvious and/or a given: they are your students or they are people you have access to (e.g., experts you know). You cannot usually compel all students to participate (although offering incentives can help increase participation rates). As noted above you should consider who actually participated. There are two variations/alternatives to convenience/non-random sampling which are discussed subsequently (purposive and snowball sampling). Qualitative For qualitative research such as interviews, , you should keep in mind who may or may not participate and consider how this might influence the perspectives you hear and/or how you might entice a more diverse sample to participate. For example interviews are typically time consuming and optional so it is likely that you will hear from highly motivated students and those who have particularly strong opinions to share. You should also consider students that are less likely to participate, and whose perspective will be missing. For example, given the highly language intensive nature of these, participants who are not confident in their English language skills. You may also have trouble finding students to participate more generally. You can try to mitigate some of these issues by attempting to recruit more students through snowball sampling and/or targeting your sampling through purposive sampling (see below). Quantitative Voluntary class surveys are always a convenience sample because you will only be getting information from the people that are willing to take the time to provide it. This includes the student surveys given at the end of each course. This is the easiest way to collect information, but the results have some major limitations. When taking or analyzing a convenience sample, it is important to be aware the kinds of students that are more likely to answer a volunteer survey or be available to sample (e.g. Highly engaged students, or those without language difficulties). These kinds of students will be over-represented in a convenience sample, which makes generalisation to a group beyond the sample unreliable. That is, it is hard to tell if the survey responses represent your class as a whole, and it is even more difficult (if not Page 16 of 35 impossible) to say how this survey could apply to other classes or students more generally. However, if your goal is to collect whatever feedback people will volunteer, or if you are not worried about generalising the findings beyond the sample that was found (i.e., assuming your findings/conclusions apply to others), then a convenience sample is sufficient. These limited surveys can be informative so long as you recognise the general demographics of who actually answered (e.g., these results tell me what strong and motivated students think). Most projects will draw on these types of samples. It is important to note that the degree of the problem of convenience sampling within a natural group such as a classroom varies based on your “response rate” since the more participants there are the more representative your sample is of your class (and if everyone responds it is perfectly representative!). Incentives to participate can help increase the response rate (though it still cannot guarantee honest effort by all students). The problem of generalisability, however, remains as your class is not a random assembly of all possible students (see natural/existing grouping below for more information). If generalisation to a wider group is important, a random sample within a larger “population” (e.g., a class or cohort) is typically needed (see below). It should be noted that it is possible to “adjust” for the biases in samples introduced in non-random sampling through advanced statistical techniques such as propensity matching, but this is beyond the scope or requirements of most projects. Pros: Easy to collect, the “default” sampling method. Cons: Not an ideal means of sampling as you have no control over who participates. Some “kinds” of students may participate at far greater levels than others. This is especially problematic for testing purposes and some quantitative analyses, especially if the goal is to generalise beyond your sample. Purposive and snowball sampling These are non-random techniques for recruiting a sample for TLR. Their particularities are discussed below. To some extent, because you can tailor your invitations to particular individuals you are more likely with both approaches to have a higher invitation/recruitment ratio than purely convenience or random methods. Purposive sampling Purposive sampling is primarily a qualitative and exploratory technique/term. It refers to intentionally selecting certain participant or participants for specific reasons (e.g., demographic, expertise, unique perspective). You should carefully consider what group(s) may have insights that would be helpful. You can purposively individuals from multiple groups. Possible sampling methods include: Page 17 of 35 Multiple perspectives: You may want to collect a variety of specific views on the same phenomenon. For example you may want an instructor, TA, high achieving student, and struggling student to share their views on the same issue or method. Experts: you seek out members of a community of interest with the most valuable insights. These members could instructors, subject matter experts, administrators. Extreme cases: you seek out notable cases such as students at the top of the class, or students that are at the highest risk to fail a course. Critical cases: you identify specific cases which together will be revealing for some aspect of study. For example, if threshold concepts are of interest, students at that are barely passing and may be missing those concepts could be good candidates for critical cases. One major advantage of purposive sampling is that it focuses strongly on a particular group tailored to the intent of the research and aligns well with both qualitative and exploratory forms of research. It is likely you will know these individuals and/or your requests for participation can be tailored so response rates may be higher (though typically relatively few participants are asked for in these designs). However, this is not typically appropriate for quantitative designs due to the emphasis on individual researcher judgement and lack of large-scale or randomisation. Further, this sampling method is inherently limited to the researcher’s awareness and so there may be some important individuals that you are not aware of and thus will not be sampled. Pros: Can provide strong and significant qualitative insights if targeted properly, can get good participation rates as targeted recruitment is more likely to get participation Cons: Reliant on being able to identify and convince specific individuals to participate for success, not generally appropriate for quantitative designs Snowball sampling The term "snowball sampling" comes from the way that a snowball being rolled on the ground can collect more snow into itself. In a sense, the snow in the ball is being used to “recruit” snow from the ground. Likewise, a snowball sample is one in which members already in the sample are used to recruit additional members. To conduct a snowball sample, first a small part of the target sample is found, preferably by random selection or pragmatically, by a purposive selection. Each of the people added to the sample are asked to recruit people they know (sometimes with specific instructions to recruit those who fit the sample of interest). In general, people are more likely to participate when asked by someone they know personally (i.e., another student) so this will often gain a higher response rate. This technique is commonly used in qualitative designs and can be used in quantitative designs as well. Snowball sampling may be used to try to raise interest and response to Page 18 of 35 research as blanket requests for participation sometimes do not get many participants. Additionally, snowball sampling is used in situations where specific individuals are rare or hard to recruit with random sampling, such as additional language speakers, students with mental health issues, or students with particular interests. This method can also access and identify networks of individuals, such as a support network of students working together and tutoring each other. In these situations where a certain part of the population is hard to find, snowball sampling can produce a much larger sample than random or purely convenience methods. More generally, this method often gets higher participation than purely convenience or random methods and is commonly used to find participants For qualitative research, this is a commonly used and accepted technique, however note that this technique isn’t as targeted as purposive sampling and as is always the case you will want to carefully study and understand the attributes of the participants who actually participate. The major quantitative drawback to snowball sampling is that it isn't random. Members of the population that are easier to find directly, or are relatively well connected in their community are more likely to be included in the sample than other groups. Thus statistical analyses are often limited in terms of their generalisability as with convenience sampling. As previously noted, there are statistical means to correct for the potential bias from such limited samples, but this it out of the scope of most projects. Pros: May facilitate higher response and access difficult to identify/recruit subsamples/individuals Cons: Produces an inherently limited sample whose participants cannot be fully anticipated, challenging for quantitative designs Grouping participants: The goal is not always to simply sample/study all available “students” or “experts” similarly or at once. Oftentimes there is a need to create groups or subgroups in TLR, especially for quantitative testing designs. Most commonly, you need groups in order to assess differences or make comparisons. The prototypical example is an experimental design such as a drug study where one group gets a placebo (i.e., sugar pill) (i.e., the “control” condition) drug and another group gets an actual treatment (new drug). The goal is to see if there is a difference between the groups. You would expect the health outcomes in the treatment group to be better. Since the groups are assumed to be largely similar (including receiving a pill; see below for variations) the change can be attributed to the treatment. In experimental designs these are typically called “control” and “treatment” groups. In a classroom, you wouldn’t be testing drugs of course, but the logic remains the same and the treatment can just as easily be a new instructional tool or method. An important point to note is that comparisons need not be between no treatment and a new Page 19 of 35 treatment. It is (as it is frequently done in drug studies as well) valid to compare “old” treatments to “new” ones (it’s often not interesting/important to know if something is simply better than nothing!). Some question the ethics of giving a group what might be an “inferior” treatment. However, this isn’t always the case. The new treatment might in fact be the inferior one as in most cases the old treatment is known to at least produce a mildly positive outcome and it’s possible the new treatment doesn’t work at all or is negative! Another way to address this potential problem is by allowing all students to receive all “conditions” through counterbalancing (see below). Note that the classroom is not a laboratory. Thus, TLR has particular challenges which you should keep in mind: • How will the various people (including researchers and participants) involved in the project directly or indirectly influence my results? • What are some elements which are beyond my control but that I should keep in mind? (e.g., time of course, students’ prior knowledge) Grouping is typically a concern with quantitative research rather than qualitative research. While you can form groups in qualitative research, this does not typically significantly affect the analysis or how you approach findings beyond those mentioned in data sources above. In quantitative designs, however, the requirements and assumptions of statistical analyses require a careful consideration of how students are grouped. Thus the sections below primarily refer to quantitative research. Natural/existing grouping Most projects will have to work with existing “natural” groupings and work with these as groups. For instance, you can try different treatments with different tutorial sections or class sections. The disadvantage of this approach over a purely random one, from a research standpoint is that the samples are likely to be biased. For instance, students who choose a morning tutorial are likely different in a variety of important ways than those who choose an evening one. All the challenges previously noted as inherent in non-random/convenience sampling apply here as well. It is, as always, important to understand the attributes of the samples/participants you have. Note that natural groups are not always less desirable than randomly assigned groups. Sometimes natural groups may be intentionally selected. For instance in a class that is streamed into two sections: one where students have certain pre-requisite courses and another where students lack these pre-requisites. These natural groups have different attributes (pre-requisites) and a variety of designs can be applied to this. For example you may want to see how differently these groups respond to a new instructional technique (does this work better with struggling students?). Alternately, you may want to try a new program meant to support students who lack pre-requisites who historically do poorly in this course and you want to “close” the performance gap between these two groups. In these cases the usual caveats regarding non-random samples apply, but you Page 20 of 35 will likely have more knowledge about the factors which influence your results than you might otherwise. Another design that uses natural or existing groupings is one that groups students based on demographic attributes. Normally this only requires identification of the groups students belong to at the data collection phase and may not necessarily involve treating the students differently during the study. Groupings will then be handled at the analysis stage. For example you may ask students to identify their major and then later compare student performance between majors, or compare the effects of different treatments on students in different majors. Note that a natural group doesn’t have to receive only one treatment such as in a counterbalancing design (see below). Pros: Simple to group and manage, natural groupings can be desirable/interesting Cons: non-random sample has inherent biases and generalisability isn’t possible Counterbalancing Counterbalancing designs are not a means of grouping, but a way of organising different conditions to groups. This design involve subgroups that typically alternate treatments (and/or control). For example, if you were trying to compare two instructional methods, such as a new one (N) and a traditional one (T), you could design an experiment like so: - Split the student body into two groups. This may already be done by way of class or lab sections. - Teach the first group with traditional instruction in weeks 1-6 before the midterm, and use the new method in weeks 7-12. - Teach the second group with the new method in weeks 1-6, and use the new traditional method in weeks 7-12. This type of design serves to address several common issues: It addresses the concern that a portion of students not receiving the “new” treatment since it is possible to have all students to be in all groups (e.g., instructional methods). Counterbalancing also allows you to effectively “double” your sample size since you can use individual students both as a “control” and “experimental/treatment” group. Having students experience every “order” of the treatment also serves to address “order” effects. That is, results may change depending on what order the treatments are received. By having each participant participate in every order/combination possible you can later statistically “adjust” for this difference based on order. Page 21 of 35 Note that unless students are randomised into these subgroups, the general problems with non-random samples will apply (e.g., groups may be different in attributes such as background knowledge). It may be helpful to establish a baseline either with a pre-test or by using an indirect measure such as cGPA to try to determine how/if the groups are different and/or potentially correct for this difference statistically later. In the example given at the beginning of this section, the counterbalanced design is “fairer” than having one group get the same method for all 12 weeks by counter balancing the order of the methods between the groups. At analysis you can make a comparison of the two methods using the midterm grades of the two groups, and you can look for order effects by comparing the final grades. Assuming the students in each group have the same background skill at the beginning of the course, if there is a difference in the final exam scores between the two groups, you have evidence that the order that the two teaching methods were delivered made a difference. Pros: This design allows all participants to participate in all conditions which may be of benefit to participants, and can provide more data for the relative conditions. This design also can offset “order effects” where the order of conditions affects results (e.g., receiving A first improves performance in B, but not vice versa). Cons: if not randomly assigned the same challenges as natural/existing groupings apply, can be complex in terms of organisation and analysis For more details, see http://www.unc.edu/courses/2008spring/psyc/270/001/counterbalancing.html Random grouping Random sampling is especially important quantitative research, and especially in testing designs (this is rarely done or considered in qualitative designs by contrast). This is because many statistical analyses require that for results to be generalisable and valid, especially to groups beyond the sample at hand, the data must be comprised of samples are drawn randomly and individual students have an equal chance of being in each group. In part, this is needed because random sampling typically “averages out” other factors which might influence your results other than what you are interested in studying. For example if you have two different instructional methods, a new one you think will increase test scores and an old one that will have relatively lower test scores. In an ideal situation you would randomly assign students into two equal groups, one that gets the new instruction, one that gets the old. The assumption here is that the only difference between the groups is the instructional method (and thus accounts for any differences in test scores you find. That is, you assume that each group, from the random assignments, other than getting a different instructional method should have similar gender ratios, academic ability, language ability, level of wakefulness in class etc. Page 22 of 35 Practically speaking, it is often difficult to randomly assign participants in a TLR project since students often by necessity make choices about the various groups they join (e.g., teams, course sections, TA tutorial sections). Thus commonly samples are natural/existing groupings (see below). It is possible to create randomised subgroups around different treatments within a class through counterbalancing (see below). Sometimes the goal is not only to randomly select, but to produce a particular distribution of participants with this random selection. Sometimes random selection alone is not enough to make reasonable generalisations for smaller subgroups. Stratified sampling (see appendix) can be used to address this. Pros: Ideal for statistical research, needed to make generalisable claims Cons: can be difficult to implement in TLR as most projects have to rely on existing natural groupings Stratified grouping Stratified sampling, or two-stage sampling, is a method where your population of interest naturally separates into groups or 'strata'. The aim of this technique is to create groups with specific attributes which pure random assignment is unlikely or unable to achieve (for instance if you want one of your groups to be comprised of a relatively rare attribute within your group such as a specific major). Stratified samples are useful when there are many groups and you would prefer to have lots of information from specific “kinds” of individuals or groups rather than sparse information on everyone generally. The first stage of a stratified to sample is to identify the attributes you would like to use to sort into strata and sort students into these sampling “pools” (e.g., major). The second is to then to predetermine the attributes of each group then random assign students from the pool into the groups until these quotas are met (e.g., group 1: arts; group 2: engineering). The attributes could also be natural groupings to ensure a good representation around certain subgroups. For example, if a large classroom was broken into dozens of small study groups, you could select 10 of the groups (at random) and look at the exam results from two students of each group. A major advantage of stratified sampling is that, when done properly, each student with a specific attribute now has an equal chance of representation, regardless of how common/rare it is in the whole group which can facilitate generalisability for certain groups. One major drawback to stratified sampling is the complexity - it involves identifying strata and performing two rounds of random selection. Also, the ability to generalise Page 23 of 35 from the sample, given that it is not truly “random” relies heavily on your ability to know what the population to “generalise to” is like to make for valid statistical inferences and analyses. Pros: Can address some weaknesses in random sampling, can help study smaller subgroups that may not be properly represented otherwise Cons: Is complex to organise and track Page 24 of 35 Teaching and Learning Project Types and Examples The following the process was developed for the Institute for the Study for Teaching and Learning in the Disciplines projects. It is divided into four steps which help guide those conducting Teaching and Learning projects towards the general elements of their designs and suggested methods used. Overview of the process: Steps 1 and 2 will help you get started by helping you identify the purpose(s) and general type of inquiry you will be conducting. In turn each has recommended data sources (step 3) and participant recruitment/grouping methods (step 4) for given project types. Step 1: Identifying your purpose(s) There are two overlapping purposes for Teaching and Learning Research (TLR). You should consider your relative interests/emphases for each research question and your project as a whole. Exploratory • Gathering in-depth information on why/how something worked (or didn’t) • Typically qualitative and exploratory • Typically has a formative intent (aim to inform change, what worked or didn’t work? Why did this happen?) • Findings speak primarily to your particular class or situation Testing: • Determining if or how well something “worked” (or did not) • Typically quantitative and hypothesis driven • Typically a summative intent (aim to inform about what happened) • May aim to “generalise” findings to other classes/situations beyond your study • Significant prior information is usually needed to do this. A testing project for example, may emerge and be informed from the results of an exploratory project. Page 25 of 35 Step 2: Identifying your project type Project Type 1: Designing/building a new instructional approach, tool or method Project Type 2: Evaluating an existing instructional approach, tool or method Project Type 3: Evaluating a complete course design or redesign Project Type 4: Program evaluation Follow the flow chart below to find the type of project most likely to describe your own. Page 26 of 35 Project Type 1: Designing/building a new instructional approach, tool or method Project focuses on developing or adapting something new and usually has a primarily exploratory purpose. Project focus: • • • I want to develop a new active learning activity into my lecture I want to create a new technology/simulation that will support students’ learning I want to develop a new way of evaluating my students The focus here is on developing something for your course. Note that this type of project will often overlap with the subsequent type (evaluation). The relative emphasis between these purposes will depend on how much relative effort and need there is for design and/or evaluation (for example whether something already exists and/or how much adaptation will be needed for your course). In the design phase, it is likely that you will not have research data to work with as your design will be relatively new and untested for your course. You can, however, try to inform your design a variety of ways: You can perform a literature search to find similar designs and pick out the features of those designs you think might be helpful and/or relevant. You can consult with experts in a structured (Delphi technique) or semi structured (interviews) way. These could be experts on the material to determine what content might be relevant, or they could be experts on the type of instruction to determine what features and/or adaptations may be needed for your course. The “regular” process for a design focused project is to build-pilot-revise. That is, you want to design your new tool/method/evaluation, implement/test/explore the implementation, then make revisions or recommendations for revisions based on this. This cycle can be repeated multiple times, those most projects, especially single phase ones will likely only complete the cycle once. For details on how to develop research insights into your pilot see the following project type. Note that for the most part, for design focused projects, the “exploratory” research purpose should normally be given more emphasis over testing as it will provide you with the detail needed to tweak the design. Page 27 of 35 See the next project type (Project Type 2: Evaluating an existing instructional approach, tool or method) for design examples and data sources for performing research on the implementation Project examples Title of Project: Future of the Book: Pedagogical Tool for English Literature Students Grant recipient: Margaret Linley, Department of English Title of Project: Using Digital Humanities to Teach How Historians Think Grant recipient: Elise Chenier, Department of History Page 28 of 35 Project Type 2: Evaluating an existing instructional approach, tool or method Project focuses on evaluation (whether and/or how well something worked). May be a part of the previous project type. Evaluations may have exploratory or testing purposes or foci. Project Focus: • • • I want to incorporate and evaluate an active learning activity into my lecture I want to integrate and evaluate an existing technology/simulation that will support students’ learning I want to assess the effectiveness of a new way of evaluating my students The focus is replacing or introducing something into an existing course. When designing this prior to the evaluation phase you want to begin to identify what key features of your design you want to assess. What about this tool/method/evaluation are you most interested in? What makes it special/different? What does “success” look like? What kind(s) of student learning are emphasised. Exploratory: This can be used in addition to, or instead of testing. This is especially important if you plan to refine or improve the tool/method/evaluation in future semesters or by other instructors. You will want to identify and focus on the features or elements that you feel are most important and gather information about these specifically. For example if you are developing a video, what is it about the video that is especially helpful or unhelpful to students? Data source suggestions (exploratory): Short answer surveys Qualitative student performance/thinking measures (e.g., concept maps, reflections/diaries, evaluating complex assignments such as portfolios or projects) Observations by an external evaluator (e.g., instructor, expert in the field) Interviews and/or focus groups (students, TA’s) Think aloud interview protocols (ask students to describe their thinking process as they work) Testing: This lends itself particularly well to testing because oftentimes what you replace/introduce is the only thing that has changed so you can compare to a previous semester or even to different sections/groups/cohorts within the course (if too many things change it can be hard to tell how to account for changes in performance). In industrial settings, such as when Google wishes to try a new interface, they call such Page 29 of 35 block-and-compare methods as “A/B testing”. In traditional statistics, they are referred to as “experimental design”. Keep in mind that one of these groups doesn’t necessarily have to have a “control” in that one group gets “nothing” or a “placebo”. The comparison can be a different treatment or even just the old method that you already know is somewhat effective. This purpose can be difficult to achieve, however, if you have no basis of comparison. Design examples: Comparing one group of students within a class with another one (e.g., one group gets one method and another group gets a different/old method) • Students can get different methods at different points of the course and be evaluated at those points • If you have a large number of students quantitative (e.g., survey methods) may be best • If you have a smaller number of students, qualitative (e.g., interviews, short answer surveys) may be best Comparing students from this offering of the course to a previous offering • If it is a past class you can use grades or assignments. It is important/ideal that the course has only “changed” in terms of what you have developed/added • If this is a multi-semester project you can collect data such as surveys or interviews and compare or track data over time Data source suggestions (testing) Surveys Quantitative student performance/thinking measures (e.g., scored or graded assignments relevant to project focus) Trace data and records (attendance, canvas, clickers) Project Example(s): Exploratory: Title of project: Development of instructional videos to improve students’ techniques in General Chemistry Laboratory courses Faculty Investigator: Sophie Lavieri (Chemistry) Title of project: Mapping Expatriate Paris, 1800--‐1960 Faculty Investigator(s): Colette Colligan and Michelle Levy (English) Page 30 of 35 Title of project: Student Response to Instructor Feedback on Writing Faculty Investigator(s): Marti Sevier, English for Academic Success/Linguistics Testing: Title of project: Comparing regular session, i>clicker and online-tutorials: Exploring student experiences and learning outcomes Faculty Investigator: Sheri Fabian and Barry Cartwright (Criminology) Page 31 of 35 Project Type 3: Evaluating a complete course design or redesign Project is similar to both designing and evaluating an instructional tool/method, but evaluates multiple changes, complete redesigns, or even new courses. The purpose is likely exploratory. Project focus: • • • • I am designing a new W course I am updating a course to better meet student needs I am updating a core course in my program I want to implement a flipped classroom These often involve making multiple and/or fundamental changes to a course, or designing a completely new course. In designing this, keep in mind what the overall goal(s) of the course is, and how you see the various elements contributing to that goal or those goals. How do the different modules relate to each other? Which elements are most critical or likely problematic? What are their common goal(s)? What are the anticipated learning outcomes of the course? What do you want students to be able to do after the course? This type of project shares much in common with the design of an instructional tool/method/evaluation and many of the notes there apply here, the difference being mainly in complexity and scope. Exploratory: This purpose is arguably most appropriate since the complexity and lack of precedent of/for the complete redesign makes it difficult/impossible to easily determine “effectiveness” on individual elements. You will want to gather formative information and focus on observations (your own, TA’s) and student feedback. You want to collect data and information that will support you in knowing how to improve/change/remove/add elements of the course design going forward. Did students see the relevance and/or link between two activities? Did TA’s find certain activities/modules too difficult or impractical? Data source suggestions (exploratory): See exploratory data source suggestions for project type 2 above You may want to conduct research prior to the design phase to help inform your design. A literature search, survey of previous students, or consultation with experts (e.g., subject matter, instructors) may be helpful for this. Page 32 of 35 Testing: This purpose is relatively limited as there is usually little to meaningfully compare to if the course is effectively new. It is not recommended that this be the only or main focus of your inquiry. Course redesigns often involve making many changes to an existing course at once, so inference about specific aspects of a course are difficult or impossible to make. However comparisons can be important, particularly in the longterm. For example if you want to establish that this new offering of the course is “effective” or valuable enough to continue to be offered. In this case you want to consider what effectiveness means to you (e.g., grades, student interest, scalability, cost, student retention, performance in future related classes). Data source suggestions (testing): • • • Collecting and comparing course evaluations Collecting and comparing grade distributions from previous course offerings Collecting subsequent student outcomes (e.g., enrolment, grades in the next course) Project Example(s): Has exploratory focus: Title of project: Increasing Opportunities for the Development of Case--‐Based Knowledge in High Enrolment Design/Production Courses Faculty Investigator: Michael Filimowicz (SIAT) Has testing and exploratory focus: Title of project: Development of a New Course: ENSC 180 Introduction to Engineering Analysis Tools Faculty Investigator: Ivan V. Bajić, School of Engineering Science Title of project: Flip the Classroom: An Investigation of the Use of Pre-Recorded Video Lectures and Its Impact on Student and Instructor Experience in Two First-Year Calculus Courses Principle Investigators: Veselin Jungić and Jamie Mulholland, Department of Mathematics, Cindy Xin, Teaching and Learning Centre (TLC) Project team: Harpreet Kaur, research assistant Page 33 of 35 Project Type 4: Program evaluation Project focus: • • • We are redesigning the course sequence in our program We are determining if the program meets student needs We are matching learning outcomes to assessment points These are the largest scale projects and involve the largest number of stakeholders. They do not focus on individual elements of courses or individual courses. The main emphasis is on collecting and organising a large amount of disparate information. Exploratory: This is the likely focus. The range of questions are large and will depend heavily on your goals and context. Some potential areas of inquiry may include: • • • • • • • What have other similar programs done and what ideas can I incorporate? Are we meeting our stated program goals? What parts of this program are working and which ones need improvement or replacement? How do the courses relate to one another? What factors are associated with student success in this program? What do we anticipate students should be able to do or should know by the end of the program? What do students think of the program and what do they do afterwards? Testing: This is unlikely to be applicable for these kinds of projects as there are far too many elements to compare with and likely no reasonable comparison possible. Data source suggestions: • • • • • • Surveys of recent graduates Surveys of current students Focus groups Interviews with stakeholders (e.g., faculty, students, professional organisations) Examining course evaluations across courses and time Examining program documentation/reports for similar programs Example(s): Page 34 of 35 Title of project: Evaluation of Student Perspectives on Their Learning Experiences in Biomedical Physiology and Kinesiology Principal Investigators: Victoria Claydon, Department of Biomedical Physiology and Kinesiology (BPK) Title of project: Transitioning to Outcome Based Education: Optimizing the Mapping of Graduate Attribute Indicators to the Curriculum of the School of Engineering Science Principal Investigator: Michael Sjoerdsma, School of Engineering Science Title of project: The Academic Enhancement Program: Evaluation of Expansion to the School of Engineering Science Principal Investigators: Diana Cukierman, School of Computing Science, and Donna McGee Thompson, Student Learning Commons Page 35 of 35