What Can We Learn from Quantitative Data in Statistics Education Research? Sterling Hilton Brigham Young University Andy Zieffler University of Minnesota John Holcomb Cleveland State University Marsha Lovett Carnegie Mellon University University of Minnesota Educational Psychology Introduction Components of a research program Generate ideas (pre-clinical) Frame question (pre-clinical, Phase I) Establish efficacy (small) Generalize findings (Phase III) Constructs and Measurement Design and Methods Pilot study Examine question (Phase I, Phase II) Develop a conceptual framework Larger studies in varied settings Extend findings (Phase IV) Longitudinal studies Different populations Introduction Quantitative methods in research program Framing: measurement development Validity and reliability Framing: pilot study Examine Generalize Extend Statistics education research is primarily in the “generate” and “frame” phases Introduction Purpose: Introduce two instruments that are in different stages of development and discuss how they have been and might be used in statistics education research Comprehensive Assessment of Outcomes in a Fist Statistics course (CAOS) Survey of Attitudes Toward Statistics (SATS) Assessment Resource Tools for Improving Statistical Thinking Several online assessments ARTIST Topic Scales Comprehensive Assessment of Outcomes in a First Statistics course (CAOS) Statistics Thinking and Reasoning Test (START) ARTIST Topic Scales 7-15 MC items Many topics Data Collection Data Representation Measures of Center Measures of Spread Normal Distribution Probability Bivariate Quantitative Data Bivariate Categorical Data Sampling Distributions Confidence Intervals Significance Tests CAOS Test 40 MC items Designed to assess students’ statistical reasoning after any first course in statistics. CAOS test focuses on statistical literacy and conceptual understanding, with a focus on reasoning about variability. Developed through a three-year process of acquiring and writing items, testing and revising items, and gathering evidence of reliability and validity. CAOS Test Reliability Analysis Sample of 10287 Cronbach’s alpha coefficient of .77 Content Validity Evidence 18 expert raters Unanimous agreement that CAOS measures important basic learning outcomes All raters agreed with the statement “CAOS measures outcomes for which I would be disappointed if they were not achieved by students who succeed in my statistics courses.” Some raters indicated topics that they felt were missing from the scale - no agreement among these raters about the topics that were missing. START Test 14 MC items Identified through a principal components analysis performed on CAOS data gathered in Fall 2005 and Spring 2006 (n = 1470). Alpha Coefficient from that data set was calculated to be 0.74. Use of Quantitative Measures in a Phase 1 Study Exploratory Studies What can we find out about students’ understanding? Where are students having difficulties? Are there inconsistencies in students’ reasoning? Example Item 1 Measured Learning Outcome Understanding the interpretation of a median in the context of boxplots. Example Item 1 The two boxplots below display final exam scores for all students in two different sections of the same course Example Item 1 Which section has a greater percentage of students with scores at or above 80? a) b) c) Section A Section B Both sections are about equal. Example Item 1 Which section has a greater percentage of students with scores at or above 80? a) b) c) Section A Section B Both sections are about equal. Example Item 1 How did students answer this item? Example Item 1 Pretest Posttest 73.7% 65.6% Section A 6.6% 6.1% Section B 28.2% Both sections are about equal. 19.6% Response (N = 754) Example Item 1 Is this surprising? What can we learn from students’ responses to this item? Implications/Directions for research? Teaching? Example Item 2 Measured Learning Outcome Understanding that correlation does not imply causation. Example Item 2 Researchers surveyed 1,000 randomly selected adults in the U.S. A statistically significant, strong positive correlation was found between income level and the number of containers of recycling they typically collect in a week. Please select the best interpretation of this result. Example Item 2 a) We can not conclude whether earning more money causes more recycling among U.S. adults because this type of design does not allow us to infer causation. b) This sample is too small to draw any conclusions about the relationship between income level and amount of recycling for adults in the U.S. c) This result indicates that earning more money influences people to recycle more than people who earn less money. Example Item 2 a) We can not conclude whether earning more money causes more recycling among U.S. adults because this type of design does not allow us to infer causation. b) This sample is too small to draw any conclusions about the relationship between income level and amount of recycling for adults in the U.S. c) This result indicates that earning more money influences people to recycle more than people who earn less money. Example Item 2 How did students answer this item? Example Item 2 Pretest 54.6% 18.3% 27.1% Posttest Response (N = 743) 52.6% We can not conclude whether earning more money causes more recycling among U.S. adults because this type of design does not allow us to infer causation. 11.4% This sample is too small to draw any conclusions about the relationship between income level and amount of recycling for adults in the U.S. 35.9% This result indicates that earning more money influences people to recycle more than people who earn less money. Example Item 2 Is this surprising? What can we learn from students’ responses to this item? Implications/Directions for research? Teaching? Example Item 3 Measured Learning Outcome Ability to match a scatterplot to a verbal description of a bivariate relationship. Example Item 3 Bone density is typically measured as a standardized score with a mean of 0 and a standard deviation of 1. Lower scores correspond to lower bone density. Which of the following graphs shows that as women grow older they tend to have lower bone density? Example Item 3 a) b) c) Graph A Graph B Graph C Example Item 3 How did students answer this item? Example Item 3 Pretest Posttest Response (N = 748) 90.5% 92.5% Graph A 6.1% 6.6% Graph B 3.3% 0.9% Graph C Example Item 3 Is this surprising? What can we learn from students’ responses to this item? Implications/Directions for research? Teaching? Example Item 4 Measured Learning Outcome Understanding of the purpose of randomization in an experiment. Example Item 4 A recent research study randomly divided participants into groups who were given different levels of Vitamin E to take daily. One group received only a placebo pill. The research study followed the participants for eight years to see how many developed a particular type of cancer during that time period. Which of the following responses gives the best explanation as to the purpose of randomization in this study? Example Item 4 a) To increase the accuracy of the research results. b) To ensure that all potential cancer patients had an equal chance of being selected for the study. c) To reduce the amount of sampling error. d) To produce treatment groups with similar characteristics. e) To prevent skewness in the results. Example Item 4 a) To increase the accuracy of the research results. b) To ensure that all potential cancer patients had an equal chance of being selected for the study. c) To reduce the amount of sampling error. d) To produce treatment groups with similar characteristics. e) To prevent skewness in the results. Example Item 4 How did students answer this item? Example Item 4 Pretest Posttest Response (N = 754) 41.4% 31.8% To increase the accuracy of the research results. 13.5% 19.8% To ensure that all potential cancer patients had an equal chance of being selected for the study. 22.7% 29.4% To reduce the amount of sampling error. 8.5% 12.3% To produce treatment groups with similar characteristics. 13.9% 6.6% To prevent skewness in the results. Example Item 4 Is this surprising? What can we learn from students’ responses to this item? Implications/Directions for research? Teaching? How Can We Use the Results? Begin to look for underlying reasons students are having difficulties Examine the research literature Interview students to gain a more indepth understanding of their reasoning Compare results with data from other classes (other teachers, schools) How Can We Use the Results? They can inform our instruction Reconsider how difficult or easy some concepts are for students Rethink how we currently teach these ideas Add new activities or tools Re-allocate classroom time Change the way we assess students Assessment items better aligned with learning outcomes Assessment items that probe students reasoning SATS Survey of Attitudes Towards Statistics Candace Schau and Tom Dauphinee (http://www.unm.edu/~cschau/satshomepage.htm) Twenty-eight item survey Seven point Likert scale response Strongly Neither agree Disagree 1 2 nor disagree 4 5 3 Strongly 6 Agree 7 SATS Original four subscales Value (9 items; α range .80 - .90 ) “Statistics is worthless.” Affect (6 items; α range .80 - .85) “I like statistics.” Cognitive Competence (6 items; α range .77 .85) “I have no idea of what’s going on in statistics.” Difficulty (7 items; α range .64 - .79) “Statistics is a complicated subject.” SATS Two additional subscales Interest (4 items) “I am interested in using statistics.” Effort (4 items) “I plan to complete all of my statistics assignments.” SATS Attitude is multi-faceted outcome Issues to consider Pre-existing attitudes Direction and magnitude of changes over a semester Relevance of items to study Using the SATS: A Case Study Assessment of a project-rich introductory statistics course Fall 2004, at Cleveland State University Class 1: 30 students Pre/Post Class 2: 16 students Pre/Post SATS administered first day and final exam day Class 1: Projects - Rich 4 team projects that used/required Real data Computer Software Collaboration Writing Individualized Mid-Term and Take-home Data Analysis Exams http://academic.csuohio.edu/holcombj/eku/index.html Login: holcomb pwd: projects22 Class 2 Ti – 83 In – Class demos Homework and Exams Comparison of Pre Data No significant difference between Class1 and Class2 PreAFFECT vs Class 7 6 PreAFFECT 5 4 3 2 1 1 2 Class PreCOGCOMP vs Class 7 PreCOGCOMP 6 5 4 3 2 1 1 2 Class PreVALUE vs Class 7 6 PreVALUE 5 4 3 2 1 1 2 Class PreDIFFICULTY vs Class 7 PreDIFFICULTY 6 5 4 3 2 1 1 2 PreINTEREST vs Class 7 PreINTEREST 6 5 4 3 2 1 1 2 Class PreEFFORT vs Class 7 PreEFFORT 6 5 4 3 2 1 1 2 Class Class 1 Change from Pre to Post (2 – sided tests) Significant Differences for: Cognitive Competence Value Difficulty* Interest Insignificant Differences for: Affect Effort *(Not Significant with Nonparametric Test) 6.00 Six Components for Class1: Pre - Post 29 4.00 29 24 2.00 0.00 -2.00 2 5 727 18 -4.00 2 2 -6.00 p = 0.541 p=0.018 p = 0.038 p = 0.049 p = 0.006 p = 0.881 diffAFFECT diffVALUE diffCOGCOMP diffINTEREST diffDIFFICULTY diffEFFORT Class 2: Change from Pre to Post (2- sided tests) Significant Differences Affect (wrong direction) Insignificant Differences Cognitive Competence Value Difficulty Interest Effort Six Components for Class2: Pre - Post 4.00 43 3.00 40 31 42 2.00 1.00 0.00 -1.00 -2.00 -3.00 32 p = 0.020 p = 0.522 p = 0.247 p = 0.303 p = 0.062 p = 0.051 diffAFFECT diffVALUE diffCOGCOMP diffINTEREST diffDIFFICULTY diffEFFORT Multivariate Analysis of Post Data Class Significant vs Insignificant Significant Differences Affect Value Interest Insignificant Differences Cognitive Competence Difficulty Effort Does SATS Ask the Right Questions? Value Component Questions Statistics is worthless. Statistics should be a required part of my professional training. Statistical skills will make me more employable. Statistics is not useful to the typical professional. Statistical thinking is not applicable in my life outside my job. I use statistics in my everyday life. Statistics conclusions are rarely presented in everyday life. I will have no application for statistics in my profession. Statistics is irrelevant in my life. What are the Questions You Want to Ask? ADD ANSWERS HERE Instructors: Do try this at home! But first, set your expectations Results may not be as high as you desire by the end of your course (e.g., CAOS) Results may not change from the beginning to the end of your course or in the direction you anticipate (e.g., SATS) Same is true for other instruments, too How might you use such data? How might you use such data? To better understand students’ learning of particular concepts and skills To identify different patterns of student performance To establish a starting point for further inquiry To make your teaching and students’ learning more effective To assess where students start and to reveal areas of difficulty during course Some Practical Considerations Motivating students to take these instruments seriously Grading? Feedback Instrument integrity Time to administer Others? INQUERI Project INQUERI = Initiative for Quantitative Education Research Infrastructure To build a research infrastructure by focusing on the development, deployment, user training, and archiving of high quality research methods, instruments, and data To disseminate these methods and results To catalyze research collaborations See www.inqueri.org Back to the Big Picture Focus on the question/goal you want to address and relate that to past research Start small Using existing instruments is one way Working within your own course to start Share with colleagues, connect with the literature, and then extend References delMas, R., Garfield, J., Ooms, A., & Chance, B. (2006). Assessing students' conceptual understanding after a first course in statistics. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, CA. Garfield, J., delMas, R., & Chance, B. (n.d.). Assessment Resource Tools for Improving Statistical Thinking Retrieved May 8, 2007, from https://app.gen.umn.edu/artist/index.html. References http://www.unm.edu/~cschau/satshomepage.htm Dauphinee, T. L., Schau, C., & Stevens, J. J. (1997). Survey of Attitudes Toward Statistics: Factor structure and factorial invariance for females and males. Structural Equation Modeling, 4, 129-141. Schau, C., Stevens, J., Dauphinee, T. L., & Del Vecchio, A. (1995). The development and validation of the Survey of Attitudes Toward Statistics. Educational and Psychological Measurement, 55, 868-875. Hilton, S. C., Schau, C., & Olsen, J. A. (2003). Survey of Attitudes Toward Statistics: Factor structure invariance by gender and by administration time. Structural Equation Modeling, 11, 92 – 109. Contact Information Sterling Hilton hiltons@byu.edu Andy Zieffler zief0002@umn.edu John Holcomb j.p.holcomb@csuohio.edu Marsha Lovett lovett@csuohio.edu