Correlation, Causation, & Evaluation A PRACTITIONER’S GUIDE TO RESEARCH METHODS JUSTIN C. SHEPHERD, PH.D. Why it’s a good idea you signed up for this workshop… Big data and analytics are becoming increasingly important Data driven decision making Eliminate hunches or guesstimates Improper methods can lead to misleading results No data is better than bad data Turn data into results! Provide accurate, meaningful findings and interpret appropriately to provide context and understanding Introductions Name Affiliation Title / Responsibilities Statistical Expertise What you’re hoping to learn Outline Outline Developing Research Questions (1 hour) Strategizing & Game Planning (2 hours) Math from a Fire Hose (2.5 hours) Interpretation & Application (1.5 hours) * Short breaks every 45 minutes to 1 hour Developing Research Questions What are the issues with this statement? Example #1 Magnitude of an Effect Size Think about the denominator Standard Error Statistical Significance What are some similar examples we’ve encountered? 150% increase in enrollment for Pacific Islanders. Went from 2 students to 5 but only account for <1% of the total student population. 50 point average increase in SAT scores after enrollment in SAT prep course. For a course with 3 people in it. 1 person increased by 200 points. 1 person stayed the same. 1 person decreased by 50 points. 33% success rate? 33% failure rate? Helpful? What are the issues with this statement? Politics ≠ Statistical Results . . . And that’s okay . . . sometimes What are some similar examples we’ve encountered? Save $1 million / year by keeping computers 1 year longer Example #2 Slow operating system. Lower morale. Dated software and support. Average high school GPA could increase if we only accepted international students. Access. Equity. Taxpayer accountability. What are the issues with this statement? Example #3 Data Decision NOT Decision Data Think of your population Think of your sample Think of the limitations What are some similar examples we’ve encountered? A sample of remedial students have lower GPAs than traditional students. Participating in Greek affairs increases participation in student activities. Students who went to the gym before 8am were more likely to retain. What are the issues with this statement? Likert Scales “Not Applicable”, “No Opinion”, “Undecided” or “Neutral” What are some similar examples we’ve encountered? Example #4 Who reads Inside Higher Ed? Who comments on Inside Higher Ed? How might these people be different than the population? Good questions beget good answers. These examples are widespread! Learn how to think for yourself. Learn how to draw your own conclusions from raw data, not fancy graphics. Learn how to protect your work from naysayers. Developing Research Questions What are some common questions? Where do they arise from? What makes a “good” question versus a “bad” question? Developing Research Questions COMMON QUESTIONS Example 1 SOURCE / FROM Example 1 Things to consider when developing a question What would happen if we admitted more Freshmen? Things to consider when developing a question Admissions Housing Infrastructure What would happen if we admitted more Freshmen? Well Being Safety Academic Support Parking Things to consider when developing a question Selectivi ty Transfer Cohort Yield Admissi ons OnCampus Hours of Operatio n High School GPA SAT Scores Room & Board Costs Infrastruc ture Maintena nce Space Well Being OffCampus RA’s What would happen if we admitted more Freshmen? Counseling Housing Parking Spaces Parking Gym Facilities Ticketing Campus Police Academic Support Tutoring Safety Emergen cy Manage ment Escorts / Rides Advising Things to consider when developing a question Even relatively simple questions can turn into complex analyses very quickly. Is the question specific? What are the units of analysis? How were they selected? How long is the observation time? What are our resources (time & money)? Developing Research Questions Was our program successful? What program? What’s the definition of success? Specificity Did our first-year orientation program result in an increase to retention rates? Developing Research Questions Did our first-year orientation program result in an increase to retention rates? For whom? Treatment versus Control Participants in the first-year orientation program as compared to non-participants Did our first-year orientation program result in an increase to retention rates for participants as compared to non-participants? Developing Research Questions Did our first-year orientation program result in an increase to retention rates for participants as compared to non-participants? What if it’s required of everyone? There is no treatment or control. How do we know it’s not a coincidence? Before and After Do we have enough historical data to make a claim? Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as compared to non-participants? Developing Research Questions Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as compared to non-participants? Was it voluntary? Why did they participate? What would have happened if they hadn’t participated? Random Assignment This is where we often fail. Most students self-select to participate in a program. There is no random assignment. Without random assignment, can’t compare participants to non-participants. Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants who otherwise would not have participated? Developing Research Questions Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as compared to non-participants? Was it worth it? How much did it cost? Cost-Benefit Analyses In higher education, this is frequently subjective. It’s hard to measure quality. Was the identified 1.5% increase to retention rates for participants of the first-year orientation program worth the $3 million investment? Developing Research Questions Break into groups and discuss real examples of research questions. What were some good components of the questions? Were there areas of miscommunication / misunderstanding? What was the result? How could they have been improved? Use the examples provided or create your own. Write these down. This is important, we’ll be using these throughout the day. Discussion. Developing Research Questions Why can’t you just compare people that participated to people that didn’t? Selection Bias Self-Selection The people that chose to participate are different than those who didn’t participate. Student government leaders are inherently different from the rest of the class. Student government may not have improved their abilities, those with high abilities chose to run for student government. Developing Research Questions Why can’t you just compare people that participated to people that didn’t? Selection Bias Forced Selection The people that were forced to participate are different than those who didn’t participate. Those who are required to seek academic tutoring are different than those who are not required. Tutoring may have helped, but it may not look like much since they struggled in the first place. Developing Research Questions Why can’t you just compare people that participated to people that didn’t? Selection Bias Spurious Relationships Unobserved student characteristics matter. Student government leaders have a higher motivation and stronger social ties. Those needing tutoring may be struggling because of financial need, employment, or family obligations. Those needing tutoring may also be receiving private tutoring or outside academic counseling. Developing Research Questions Why can’t you just compare people that participated to people that didn’t? Simultaneity / Reverse Causality Both the treatment and outcomes are changing at the same time. As crimes on campus increase, the size of the campus police force is increased. But a larger police force doesn’t equate to more crime, even though there is a positive correlation. Positive Relationship Campus Crime Campus Police Developing Research Questions Why can’t you just compare people that participated to people that didn’t? Simultaneity / Reverse Causality Both the treatment and outcomes are changing at the same time. As crimes on campus increase, the size of the campus police force is increased. But a larger police force doesn’t equate to more crime, even those there is a positive correlation. Positive Relationship Campus Crime Campus Police Negative Relationship Developing Research Questions Why can’t you just compare people that participated to people that didn’t? History Environmental changes may cause a temporary or permanent shift. Introduction of the MS in Computer Science MOOC. Developing Research Questions Why can’t you just compare people that participated to people that didn’t? Selection Bias Self-Selection Spurious Relationships Simultaneity / Reverse Causality History Without a traditional experiment, you have to use math to help you identify treatment effects. UH-OH Developing Research Questions Knowing what we now know, if you could propose one research question to the president at your institution, what would it be? Write these down. This is important, we’ll be using these throughout the day. Break Strategizing and Game Planning Strategizing and Game Planning X X X X X X O O O X O X X O O O O Scatterplot? X O O X O Strategizing and Game Planning X X X X X X O O O X O X X O O O O Regression? X O O X O Strategizing and Game Planning X X X X X X O O O X O O O X X O X O O O Gameplan. Time to get into the X’s and O’s of Research Design. X O Experimental Designs So what is a traditional experiment? R O X O Treatment group R O O Control group R is random assignment X is the treatment Experiments are the gold standard against which all other research is evaluated. Experimental Designs Why are traditional experiments important? Especially since we rarely do them? Experiments form the mathematical foundation of causation Control group and treatment group are the same Only difference is presence of treatment Difference in performance is attributable to only the treatment Treatment causes the effect Causality What do you need in order to establish causality? Association Time Order Nonspuriousness Mechanism Context Causality What do you need in order to establish causality? Association Correlation between X and Y Increasing financial aid by $1000/yr increases the probability of graduation by 10% Direction Magnitude Causality What do you need in order to establish causality? Time Order Increasing financial aid by $1000/yr increases the probability of graduation by 10% Financial aid affects graduation Financial aid must come before graduation Causality What do you need in order to establish causality? Time Order What’s wrong with the following statement? “As retention rates increase by 1 percentage point, the average SAT score increases by 100 points.” High correlation, but reverse causality. The time order is reversed. Retention rates don’t cause SAT scores to increase. SAT scores might cause retention rates to increase. Causality What do you need in order to establish causality? Time Order What’s wrong with the following statement? “As high school GPA increases by 0.10, the average SAT score increases by 100 points.” High correlation, but simultaneity. Both things are happening at the same time. High school GPA doesn’t cause SAT scores to increase. SAT scores don’t cause high school GPA to increase. They are both highly correlated measures of the same principle, aptitude. Causality What do you need in order to establish causality? Nonspuriousness No outside factors could cause the relationship 25 600 20 500 400 15 300 10 200 5 100 0 0 Ice Cream Sales Shark Attacks Surfers Beware! Sharks Scream for Ice Cream. Number of Shark Attacks Ice Cream Sales In Millions 700 Causality What do you need in order to establish causality? Nonspuriousness 90 600 80 25 80 20 60 400 50 300 40 30 200 20 100 Shark Attacks 70 500 90 70 60 15 50 40 10 30 20 5 10 0 0 Ice Cream Sales Temperature 10 0 0 Shark Attacks Temperature Temperature 700 Temperature Ice Cream Sales In Millions No outside factors could cause the relationship Causality What do you need in order to establish causality? Mechanism How something happens. Temperature ↑ People Get Hot People Want to Cool Down Ice Cream Sales ↑ Temperature ↑ People Get Hot People Want to Cool Down People Swim Shark Attacks ↑ Causality What do you need in order to establish causality? Context Context in which it happens. For whom? When? Under what conditions? Increasing financial aid by $1000/yr increases the probability of graduation by 10% Maybe it’s actually 12% for black students Maybe it’s only 6% for white students Maybe it’s only 3% at community colleges Causality What do you need in order to establish causality? Association Time Order Nonspuriousness Mechanism Context Experimental Designs Experimental designs help establish causality by isolating the effect of the treatment. R O X O Treatment group R O O Control group R is random assignment X is the treatment Experimental Designs R O X O R O O Association: X correlated to O. The treatment is correlated to the post-test results. As the treatment changes, the post-test results are likely to change. Experimental Designs R O X O Time Order: X associated with post-test results. R O Otherwise, results wouldn’t change. O Without treatment, the pre-test and post-test results should be the same. Differences in the treatment group between the pre-test and post-test are therefore because X, the treatment, resulted in the change. Experimental Designs Non-spuriousness: Ensures equivalent groups. R O X O R O O Random assignment makes the assumption that groups are equivalent because everyone has the same probability of receiving either the treatment or the control and therefore the groups are random and, if large enough, not likely to differ. Equivalent groups means there is nothing else that could have caused the difference between the treatment and control groups. Experimental Designs R O X O R O O Gives a baseline. Pre-tests help to show that the groups are equivalent by establishing a baseline. If there are large differences in the pre-test, the assignment may have failed. Experimental Designs R O X O R O O Gives a baseline. Pre-tests help to show that the groups are equivalent by establishing a baseline. If there are large differences in the pre-test, the assignment may have failed. Pre-tests also help to show the difference made by the treatment by comparing measures before and after the treatment was administered. Experimental Designs R O X O R O O Take the difference in the results. Because the groups are randomly assigned, and the only difference between the groups is whether they were treated or not, the treatment effect is simply the difference between those who were treated and those who were not. Experimental Designs But there can be all sorts of different experimental designs: R O O O O X O O O O R O O O O O O O O R O X O O O O O O O R O O O O O O O O Multiple observations over time help to account for environmental changes. Multiple observations after the treatment look for a diminishing or lagged effect. Experimental Designs But there can be all sorts of different experimental designs: R O X O X O X O R O O R O X O R O O R X O R O O O Multiple treatments help to identify patterns. Eliminating a pre-test for select groups ensures that the pre-test did not influence the posttest results. Experimental Designs But there can be all sorts of different experimental designs: R O X1 O R O X2 O R O X3 O R O O Different treatments can help isolate which treatment is best. Experimental Designs No matter which design, they all have key common elements: 1. Treatment and Control Group 2. Before and After Observations 3. Random Assignment Without these elements, the fundamental assumptions of experimental research break down and we must use math to isolate the effect. Experimental Designs Break into teams. Using your questions raised earlier, figure out a way how you might be able to design an experiment. Which type of design did you use? Why? How would you assign / select the participants? What are the potential pitfalls? Experimental Designs Example & Discussion. Read the handout about the FAFSA Experiment. What is/are the treatment(s)? How were people assigned? Which design was used? What does this tell us about the application process for financial aid? How might you change the experiment? What else could be done to expand upon these findings? Break Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as compared to non-participants? O X O Treatment group O O Control group No random assignment. Students choose to participate. OR O X O Treatment population No random assignment or control group. Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as compared to non-participants? X O Treatment group O Control group No random assignment. No pre-test. Students choose to participate. OR X O Treatment population No random assignment, pre-test, or control group. Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as compared to non-participants? Why can’t we just design an experiment? Research Ethics Pitfalls of Experimental Designs Informed Consent Choice Do No Harm Human Subjects No experiment is worth the results if people are manipulated against their will, have their privacy invaded, or are harmed emotionally, psychologically, physically, or otherwise. Research Ethics Example & Discussion. Read the handouts about the Stanford Prison Experiment and the Facebook Experiment. What are the ethical issues involved with each experiment? If you were part of IRB, would you approve the Stanford Prison Experiment? If you were part of IRB, would you approve the Facebook Experiment? Research Ethics Pitfalls of Research Designs Data Fabrication Results Manipulation Program/Treatment Falsification Conflict of Interest Personal Gain Questions drive research. Answers do not. Research Ethics University of Missouri at Kansas City (February 2015) “The University of Missouri at Kansas City gave the Princeton Review false information designed to inflate the rankings of its business school, which was under pressure from its major donor to keep the ratings up…” Inside Higher Ed Georgia Institute of Technology (January 2015) “…a former tenured professor of electrical engineering at Georgia Institute of Technology, has been indicted on two counts of racketeering, based on allegations that he poured some $1 million of university funds into his own tech company…” Inside Higher Ed University of North Carolina (October 2014) “Over nearly two decades, professors, coaches, and administrators either participated in the scheme or overlooked it, undercutting the core values of one of the nation’s premier public universities.” Chronicle of Higher Education Tulane University (January 2013) “U.S. News & World Report has moved Tulane University’s business school to the “unranked” section of its business-school listings after the school’s recent admission that it had inflated test scores and the number of completed applications to its full-time M.B.A. program for several years.” Chronicle of Higher Education Research Ethics George Washington University (November 2012) “George Washington officials said they later discovered that the admissions office had been estimating the class rank for high-performing students whom they “assumed” were in the top 10 percent of their classes, based on their grade-point averages and standardized-test scores.” Chronicle of Higher Education Emory University (August 2012) “Emory University intentionally misreported its admissions data for more than a decade, with the knowledge and participation of the leadership of the admission and institutional-research offices” Chronicle of Higher Education Claremont McKenna College (January 2012) “A senior administrator at Claremont McKenna College has resigned after admitting to falsely reporting SAT statistics since 2005…” Chronicle of Higher Education American Psychological Association Ethics Code “authors should not submit manuscripts that have been published elsewhere in substantially similar form or with substantially similar content.” Revised Garbage Can Kingdon, 1995 Problems Solutions Policymaker Politics Decision Research Ethics We are researchers, not policymakers. Questions drive research. Answers do not. Problems Politics Research Solutions Research Ethics Research should be objective and unbiased. Let the research generate solutions and leave the politics of the decision making to the policymakers. Problems Politics Research Solutions Research Ethics Pitfalls of Research Designs Data Fabrication Results Manipulation Program/Treatment Falsification Conflict of Interest Personal Gain Questions drive research. Answers do not Do not let your research integrity be jeopardized by people, politics, or circumstances. It’s unethical. It’s illegal. It invalidates everything you’ve ever done or will ever do. Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as compared to non-participants? So back to our original question: Why can’t we just design an experiment? Choice Students should be allowed to decide if they want to participate O X O O O Withholding a potential benefit Don’t harm non-participants if you know it’s a beneficial program O X O Since we don’t have an experiment, what can we do? The lack of an experiment threatens causality. Now the groups differ on either observable or unobservable characteristics. Differences between groups means that the result may have been to differences in populations and not due to the treatment. Quasi-Experimental Designs Compare participants to non-participants? No! The reason why they participated often directly impacts the outcome. High achieving students are more likely to participate in the first-year orientation program, making it look better than it actually is. Participants Non-Participants “Participants performed better than non-participants… …but they would have anyhow! The program actually didn’t do anything.” Quasi-Experimental Designs Compare participants to non-participants? No! The reason why they participated often directly impacts the outcome. Low achieving students are more likely to participate in remedial education, making it look worse than it actually is. Policy: “Students with less than a 500 on their SAT Math are required to enroll in MATH 0150.” GPA of MATH 0150 Students = 2.0 GPA of Non- Math 0150 Students = 3.0 Non-Analyst: “MATH 0150 is associated with a lower GPA. We should eliminate MATH 0150.” Analyst Response: “Had these students not enrolled in MATH 0150, their GPA would have been 1.5. MATH 0150 actually improves GPA by 0.5!” Quasi-Experimental Designs Compare participants to non-participants? No! The reason why they participated often directly impacts the outcome. High achieving students are more likely to participate in the first-year orientation program, making it look better than it actually is. Low achieving students are more likely to participate in remedial education, making it look worse than it actually is. Need to identify the counterfactual – what would have happened had the program never been introduced. Yet the simple comparison between participants and non-participants is what we oftentimes present. Quasi-Experimental Designs In essence, because the groups differ, we’re not interested in comparing Participants to Non-Participants Instead, we want to try to compare the results of participants to what would have happened had they not received treatment. Participants Receiving Treatment to Alternate Reality without Treatment As you sci-fi fans can imagine, this might be a bit difficult given our current understanding of physics. Quasi-Experimental Designs Since, we can’t compare alternate realities, we have to try to develop a control group that resembles the treatment group as closely as possible. Participants to Non-Participants But no matter how closely we try to match the groups, without random assignment, there will always be slight differences between the groups. There is no perfect match. Even if all the measured variables match, there will always be unobserved characteristics that cannot be captured. The groups look equivalent… Treatment Group Control Group Age 18.6 18.7 Female 0.61 0.60 Asian 0.11 0.12 Black 0.13 0.12 Hispanic 0.13 0.13 White 0.60 0.61 HS GPA 3.86 3.84 SAT Verbal 604 606 SAT Math 711 709 Then why might the treatment group have decided to participate in orientation while the control group did not? Then why might the treatment group have decided to participate in orientation while the control group did not? TREATMENT GROUP CONTROL GROUP Extrovert Introvert Social Isolated School Pride / Spirit Career Oriented Extrinsically Motivated Intrinsically Motivated None of these are being measured! The groups will differ. Quasi-Experimental Designs Compare participants to non-participants after controlling for their characteristics? It’s a start. Use regressions to control for student characteristics. Students who participated in the first-year orientation program were 5% more likely to be retained for their second year after controlling for academic preparation. But we can’t measure everything. Multivariate and Logistic Regression Models Quasi-Experimental Designs RETENTION = ƒ (treatment, age, gender, race/ethnicity, … ) OR RETENTION = β0 + β1 treatment + β2 age + β3 gender + β4 race/ethnicity + … OR RETENTION = 0.0013 + 0.05 treatment – 0.01 age + 0.03 gender + 0.02 race/ethnicity + … This is the coefficient of interest. Quasi-Experimental Designs Match participants to non-participants based on their characteristics? Better! Match participants to non-participants based on their characteristics and then compare the results. Students who participated in the first-year orientation program were 2.8% more likely to be retained for their second year when compared to matched non-participants. Yet again, still can’t measure everything. Not everyone has a match. Propensity Score Matching Quasi-Experimental Designs TREATMENT GROUP CONTROL GROUP Quasi-Experimental Designs TREATMENT GROUP CONTROL GROUP TREATMENT EFFECT = 82 = 79 3 = 86 = 82 4 = 85 = 81 4 = 82 = 80 2 = 81 = 79 2 = 80 = 78 2 2.83 Quasi-Experimental Designs Look at changes in retention rates over time? Sure, but cautiously. Look at the trajectory before the first-year orientation was introduced. Then look at the trajectory after the first-year orientation was introduced. Look for a jump in retention rates. But there’s a lot out there that could have also happened during this time to explain any jumps, so you’ll need to take some additional steps to isolate the effect of the orientation program. Longitudinal and Time-Series Analyses Fixed Effects Difference-in-Differences Since the introduction of the program in 2010, retention rates have risen… Retention Rates 86 84 82 80 78 76 74 X O O O O O 72 Program Success! ? 70 2006 2007 2008 2009 2010 2011 2012 2013 2014 But at a lower rate than they were before the program was introduced. Retention Rates 86 84 82 80 78 76 74 O O O O X O O O O O 72 Program Failure! 70 2006 2007 2008 2009 2010 2011 2012 2013 2014 An example with treatment and control groups. Retention Rates 84 83 82 81 O O O O X O O O O O O O O O O O O O O Treatment Effect 80 79 78 77 76 75 74 2006 2007 2008 2009 2010 Treatment Group 2011 Control Group 2012 2013 2014 Quasi-Experimental Designs The most important part of quasi-experimental designs is the DESIGN. If there is no design, you’ve limited yourself to post hoc analysis. Spend time designing the implementation. Treatment and control groups Pre-test and post-test Timing Scale Pilot programs Quasi-Experimental Designs The better the design, the more tools at your disposal. Program Evaluation Reliability – Are the methods of measurement consistent? Validity – Are the results measured correctly? Implementation Fidelity – Is the program being run according to plan? Treatment Effect – Is the treatment having the intended effect? Cost-Benefit – Are the results worth the cost? Pop Quiz QUESTION METHOD Does participation in the undergraduate research program lead to higher GPA’s? Group Comparison Regression Matching Time-Series Analysis None of the Above Why? What’s the underlying issue? Pop Quiz QUESTION METHOD Are Hispanic students more likely to major in Industrial Engineering? Group Comparison Regression Matching Time-Series Analysis None of the Above Why? What’s the underlying issue? Pop Quiz QUESTION METHOD Did the implementation of the mandatory requirement to enroll in a Freshman seminar improve retention rates? Group Comparison Regression Matching Time-Series Analysis Why? What’s the underlying issue? None of the Above Pop Quiz QUESTION METHOD Are West Point students who attend the ArmyNavy game more likely to be retained? Group Comparison Regression Matching Time-Series Analysis None of the Above Why? What’s the underlying issue? Pop Quiz QUESTION METHOD Do athletes who receive tutoring have higher GPA’s than those who do not? Group Comparison Regression Matching Time-Series Analysis None of the Above Why? What’s the underlying issue? Pop Quiz QUESTION METHOD Do faculty in California earn more than faculty in New York? Group Comparison Regression Matching Time-Series Analysis None of the Above Why? What’s the underlying issue? Pop Quiz QUESTION METHOD Does participation in Greek life lead to a higher probability of graduation within 6 years? Group Comparison Regression Matching Time-Series Analysis None of the Above Why? What’s the underlying issue? Quasi-Experimental Designs Break into teams. Using your questions raised earlier, figure out a way how you might be able to address the experimental shortcomings. How would you design your program? Treatment. Control. Pilot. Timing. Etc. Which type of quasi-experimental design would you use? Group comparison. Regression. Matching. Time Series. Etc. Why? Break Math from a Fire Hose P L E A S E D O W N LO A D S H EPH ER D_ ECONOMET R I C S_DATA S ETS. X LSX F O R E X A M P L E D ATA P L E A S E D O W N LO A D S H EPH ER D_ ECONOMET R I C S_E X ERC I S ES. X LSX F O R E X E R C I S E D ATA Math from a Fire Hose You’ll need to install the following Add-In to Microsoft Excel if you’re not planning on using SAS. File Options Add-Ins Analysis ToolPak Go Check Analysis TookPak OK Math from a Fire Hose Data Math from a Fire Hose We‘ve covered: Developing Research Questions What makes a good question. How the question drives the research. Why clear, specific questions are important. Math from a Fire Hose We‘ve covered: Developing Research Questions Sources of Bias The issues in trying to address questions without considering statistics. Math from a Fire Hose We‘ve covered: Developing Research Questions Sources of Bias Causality Why the sources of bias are so important. Why correlation does not equal causality. Why causality is the goal. Math from a Fire Hose We‘ve covered: Developing Research Questions Sources of Bias Causality Experimental Designs Why experiments are the gold standard to determine causality. Why it’s so hard to design and establish. Math from a Fire Hose We‘ve covered: Developing Research Questions Sources of Bias Causality Experimental Designs Research Ethics Where research goes wrong. Math from a Fire Hose We‘ve covered: Developing Research Questions Sources of Bias Causality Experimental Designs Research Ethics Quasi-Experimental Designs Methods to address research questions when there is no experiment. Methods to address research questions when there is no design at all (post hoc). Methods to evaluate programs. Math from a Fire Hose We‘ve covered: Developing Research Questions Sources of Bias Causality Experimental Designs Research Ethics Quasi-Experimental Designs Now it’s time to address how each of these work in practice. Math from a Fire Hose Data Types Confidence Intervals t-tests Regression Multivariate Regression Logistic Regression Propensity Score Matching Time-Series Analysis Data Types Character / String Qualitative Feelings, emotions, narrative, politics, “how”, and other aspects that cannot be captured in numbers. Jackie feels very motivated when she gets to study a subject she enjoys. Decisions are typically made by forming committees where the chair directs discussion. Data Types Character / String Qualitative Categorical A non-numeric category which we will then try to turn into a numeric value. Class Freshman. Sophomore. Junior. Senior. Freshman = 1 Sophomore = 2 Junior = 3 Senior = 4 Data Types Character / String Qualitative Categorical Numeric Binary Yes / No answers coded as 1 / 0 Gender Male. Female. MALE Male = 1 Class Female = 0 Freshman. Sophomore. Junior. Senior. FRESHMAN Freshman = 1 SOPHOMORE Freshman = 0 Sophomore = 0 Sophomore = 1 Junior = 0 Junior = 0 Senior = 0 Senior = 0 Data Types Character / String Qualitative Categorical Numeric Binary Discrete Integers Counts Enrollment = 22,491 Cannot have an enrollment count of 22,490.56 Data Types Character / String Qualitative Categorical Numeric Binary Discrete Continuous Fractions and Decimals GPA GPA = 3.86525 Often rounded to 2 decimal places (3.87) Money 4.125% return on $1,256,547 = $1,308,379.56375 Often rounded to the dollar or penny ($1,308,379.56) Data Types Character / String Qualitative Categorical Numeric Binary Discrete Continuous NOIR Data Types Nominal Data order has no meaning. Race/Ethnicity 1 = White 2 = Black 3 = Hispanic 4 = Asian Could just have easily been 1 = Asian 2 = Black 3 = Hispanic 4 = Other 5 = Other 5 = White Data Types Nominal Ordinal Data order matters, but has a subjective scale Likert Scale 1 = Strongly Disagree 2 = Disagree 3 = Neutral 4 = Agree 5 = Strongly Agree How much is the difference between Strongly Disagree and Disagree? The same as the distance between Agree and Strongly Agree? The same as the distance between Disagree and Neutral? Data Types Nominal Ordinal Interval Data order has meaning and equal units, but no ratio scale Temperature 90° to 91° equals 1° 45° to 46° equals 1° But 90° is not twice as hot as 45° Data Types Nominal Ordinal Interval Ratio Data order has meaning, units are equal, and ratio scales are true Money $1,000 to $1,001 is $1 $500 to $501 is $1 $1,000 is twice as much money as $500 Confidence Intervals 0.45 0.4 0.35 0.3 0.25 0.2 Mean = 0 S. D. = 1 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 Normal Distribution 2 3 4 Confidence Intervals 0.45 0.4 0.35 0.3 0.25 0.2 0.15 Area = 1 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 90% Confidence Interval 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 Area = 0.05 Area = 0.05 0.05 0 -4 -3 -2 -1 0 1 2 3 90% CI is the same at α = 0.10 (the sum of the two areas) 4 95% Confidence Interval 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 Area = 0.025 Area = 0.025 0.05 0 -4 -3 -2 -1 0 1 2 3 4 90% CI is the same at α = 0.05 (this is the standard level of statistical significance) 95% Confidence Interval 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 2 Results out here are not due to chance. 3 4 Confidence Intervals 5% significance level means with 95% certainty that the results are not due to chance. α = 0.05 p < 0.05 means the results are significant (not due to chance) This is also affected by the sample size The greater the sample size, the greater the likelihood that statistical significance can arise Type I Error – the results are significant when they actually should not be Type II Error – the results are not significant but they actually are Would rather find no relationship when one actually exsists instead of finding a relationship that isn’t actually there Use at least 5% significance level, if not greater The higher the significance level, the greater the certainty that the results are not due to chance. p < 0.001 is highly significant Confidence Intervals 5% significance level means with 95% certainty that the results are not due to chance. α = 0.05 p < 0.05 means the results are significant (not due to chance) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 2-tail p ≈ 0.32 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 Results are not significant. No difference between means. Could be due to chance. Confidence Intervals 5% significance level means with 95% certainty that the results are not due to chance. α = 0.05 p < 0.05 means the results are significant (not due to chance) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 2-tail p ≈ 0.035 0.05 0 -4 -3 -2 -1 0 1 2 3 4 Results are significant. Means differ. Not likely due to chance. Confidence Intervals 5% significance level means with 95% certainty that the results are not due to chance. α = 0.05 p < 0.05 means the results are not equal to 0 1 0.5 Coefficient and 95% CI contains 0. Not significant. Results could equal 0. 0 -0.5 -1 -1.5 -2 Coefficient and 95% CI do not contain 0. Significant. -2.5 -3 X1 X2 t-tests Group comparisons Looks for a difference in mean values, considering the variation of each group t-score = 𝑋1 − 𝑋2 2 𝑆2 1 + 𝑆2 𝑛1 𝑛2 t-tests Group comparisons Looks for a difference in mean values, considering the variation of each group t-score = 𝑚𝑒𝑎𝑛1 − 𝑚𝑒𝑎𝑛2 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒1 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒2 + 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒1 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒2 t-tests Group comparisons Good to use when comparing samples of the same population Differences between a pre-test and a post-test Differences between independent samples Good to use when estimating regression coefficients Whether a coefficient is statistically significant Excel Example t-tests Do they have equal variances? Not usually. Do they have equal sample sizes? t-tests p > 0.05 Not Significant No difference in means t-tests t-tests p < 0.05 Significant 9.83 difference in means t-tests t-tests p > 0.05 Not Significant t-tests Example 1: There was no statistically significant difference in the test scores. The treatment had no effect on test scores. Example 2: There was a statistically significant difference in the test scores. The treatment is associated with a 10 point increase in test scores, on average. Example 3: There was no statistically significant difference in the samples. Despite a 10 point difference in the means, there was no statistically significant difference due to the differences in sample sizes and the large variation. The means could actually be the same. Fail to reject the notion that they are equal. t-tests – Exercise Open the t-test tab in the Exercises workbook and determine if there are differences in the means X1 and X2 X2 and X3 X1 and X3 Is there a statistically significant difference? What is the magnitude of the difference? When finished, take a short break. Regression Basic Correlation Looks at the relationship between an independent variable and dependent variable 𝑦 = β0 + β1 𝑋 y is the dependent variable X is the independent variable β0 is the intercept β1 is the slope (also called the coefficient or effect size) Regression Basic Correlation Looks at the relationship between an independent variable and dependent variable 𝑦 = β0 + β1 𝑋 A 1 unit change in X is associated with a β1 change in 𝑦 Can also be used to predict values within the minimum and maximum of X If X = 75 then 𝑦 = β0 + β1 (75) Excel Example Regression Data Data Analysis Regression Regression SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.94284 0.888947 0.88498 25.66225 30 ANOVA df Regression Residual Total Intercept X Variable 1 1 28 29 SS 147601.8 18439.43 166041.2 MS 147601.8 658.5511 F 224.1311 Significance F 6.85E-15 Coefficients 26.70054 2.430899 Standard Error 10.08497 0.162374 t Stat 2.647559 14.97101 P-value 0.013163 6.85E-15 Lower 95% 6.042424 2.098292 Upper 95% 47.35866 2.763507 Lower 95.0% 6.042424 2.098292 Upper 95.0% 47.35866 2.763507 Regression If p > 0.05 then the model is no good. The goodness of fit for the line. The most important part. Regression This is β0. This is β1. 𝑦 = 26.701 + 2.431 𝑋 Which is exactly what we got in Excel, but with much more detail. Regression Remember the t-tests? Our t-values = β / SE Which yields our p-values for significance. Regression Since p < 0.05, X is a significant predictor of Y. If p > 0.05, then X and Y have no significant relationship. A 1 unit change in X isn’t likely to affect Y. Regression 𝑦 = 26.701 + 2.431 𝑋 A 1 unit change in X is associated with a 2.431 increase to Y. Regression 𝑦 = 26.701 + 2.431 𝑋 If X = 75, what would we expect Y to be? 1. Make sure that 75 is within the minimum and maximum values of X. Models can fail if trying to predict values outside of observed values. 2. If 75 is within these values, then… 𝑦 = 26.701 + 2.431 75 = 209.026 Regression 300 y = 2.4309x + 26.701 R² = 0.8889 250 200 150 100 50 0 0 20 40 60 80 100 2. If 75 is within these values, then… 𝑦 = 26.701 + 2.431 75 = 209.026 120 Regression In this example, I’ve built in a ton of variation. Regression Results are still significant, but biased. We know the relationship is Y = 2X, but the extra variation is throwing off our estimate. Regression 1200 y = 4.2341x + 424.06 R² = 0.1673 1000 800 600 400 200 0 0 20 40 60 80 100 120 Regression - Exercise Open the reg tab in the Exercises workbook and estimate the relationship between SAT and Freshman GPA (YR1_GPA). What is the magnitude of the relationship? Is it statistically significant? What is the expected Freshman GPA of a student who earned a 1450? Try to derive these results both mathematically and using Data Analysis. When finished, take a short break. Multivariate Regression Basic Correlation Looks at the relationship between multiple independent variables and the dependent variable 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … y is the dependent variable 𝑋1 is an independent variable 𝑋2 is an independent variable 𝑋3 is an independent variable Multivariate Regression Basic Correlation Looks at the relationship between multiple independent variables and the dependent variable 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … β0 is the intercept β1 is the slope for 𝑋1 β2 is the slope for 𝑋2 β3 is the slope for 𝑋3 Excel Example (mreg1 & mreg2) Multivariate Regression BIVARIATE REGRESSION MULTIVARIATE REGRESSION Mean of X 55.00 Mean of X1 55.00 S.D. of X 29.35 S.D. of X1 29.35 Variance of X 861.31 Variance of X1 861.31 Mean of X2 N/A Mean of X2 50.40 S.D. of X2 N/A S.D. of X2 28.21 Variance of X2 N/A Variance of X2 795.77 Mean of Y 160.40 Mean of Y 160.40 S.D. of Y 75.67 S.D. of Y 75.67 Variance of Y 5725.56 Variance of Y 5725.56 The data is exactly the same. The only change is the addition of X2. Multivariate Regression BIVARIATE REGRESSION MULTIVARIATE REGRESSION I built the model with the formula Y = 2X + Error. Because we are not controlling for the error, it makes X biased. When we control for the source of the bias (an omitted variable) we get the true estimate of X1. Multivariate Regression BIVARIATE REGRESSION MULTIVARIATE REGRESSION I build the model with the formula Y = 2X + Error. This is especially important when the omitted variable has a large influence on Y or there is a lot of variation in the variables. Again, we get the true value of X1 once we control for the source of the bias. Multivariate Regression BIVARIATE REGRESSION MULTIVARIATE REGRESSION I build the model with the formula Y = 2X + Error. As can be seen, if we don’t think statistically, we could get an estimate that is no where near the truth. Multivariate Regression If omitted variables and other sources of bias are so important, what do we need to include in our models? 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS Use theory and established literature to develop your models. Use the most parsimonious model. Focus on the TREATMENT variable in evaluation designs. Multivariate Regression Example 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS What should be included as controls? Anything that could possibly affect 𝑦 Non-spuriousness Anything that could affect the results of other 𝑋’s Independence Multivariate Regression Example 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS What shouldn’t be included as controls? Combinations of another 𝑋 Don’t include SAT Math, SAT Verbal, and SAT Composite because SAT Composite = SAT Math + SAT Verbal Combinations of a categorical 𝑋 Don’t include controls for both Male and Female because if you’re not Male, you’re Female In essence, don’t double count your 𝑋 variables Multivariate Regression Example 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS Let’s brainstorm some ideas for GPA. SAT Score On Campus vs Off Campus Housing HS GPA Race / Ethnicity Pell Recipient Gender Greek Life Multivariate Regression More about categorical 𝑋 variables Student ID Race / Ethnicity White 001 White Black 002 Black 003 White 004 Asian 005 Hispanic 006 White 007 White How do we control for race/ethnicity? Hispanic Asian Other Need to select a comparison group Tend to select the largest group Need to create binary variables for each category Multivariate Regression More about categorical 𝑋 variables Student ID Race / Ethnicity White Black Black 001 White 1 0 0 0 0 Hispanic 002 Black 0 1 0 0 0 003 White 1 0 0 0 0 004 Asian 0 0 0 1 0 005 Hispanic 0 0 1 0 0 006 White 1 0 0 0 0 007 White 1 0 0 0 0 How do we control for race/ethnicity? White Asian Other Comparison Group = White Coefficients compare each group to the selected comparison group For example, Hispanic students are associated with a higher GPA than white students Hispanic Asian Other Multivariate Regression Example 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 HOUSING + β7 RACE + β8 GENDER … Name your binary variables after the affirmative. GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS + β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE … Multivariate Regression Example GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS + β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE … Build Up Approach Model 1: GPA = β0 + β1 TREATMENT Add Academic Preparation Model 2: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA Model 3: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 BLACK + β5 HISPANIC + β6 ASIAN + β7 OTHER + β8 MALE … Add Race/Ethnicity & Gender Full Model: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS + β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE … Add Other Student Characteristics Multivariate Regression Example GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS + β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE … Tear Down Approach Full Model: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS + β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE … Model 2: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 GREEK + β5 ONCAMPUS + β6 BLACK + β7 HISPANIC + β8 ASIAN + β9 OTHER + β10 MALE … Multivariate Regression Model is strong. Model explains roughly 21% of the variance. Those receiving the treatment are associated with a 0.76 increase to GPA. Treatment is a statistically significant predictor of GPA. Multivariate Regression Model is strong. Model now explains roughly 90% of the variance. A vastly improved model fit. Multivariate Regression Treatment is no longer statistically significant once controlling for SAT & HSGPA. A 100 point increase in SAT scores is associated with a 0.13 increase to GPA, holding all else constant. A 1 point increase in HSGPA is associated with a 0.60 increase to GPA, holding all else constant. But both SAT & HSGPA are statistically significant. Multivariate Regression This essentially means that treatment had no effect once we control for previous academic preparation. Multivariate Regression Model is strong. Model explains roughly 89% of the variance. Multivariate Regression SAT and HSGPA are largely unchanged and both still significant. This means that race/ethnicity and gender are not strong predictors of GPA after controlling for previous academic preparation. You could, arguably, exclude race/ethnicity and gender from the model if they are not relevant to theory or previous literature. None of the race/ethnicity or gender variables are statistically significant. Multivariate Regression Model is strong. Model explains roughly 91% of the variance. Multivariate Regression SAT and HSGPA are largely unchanged and both still significant. This means that SAT and HSGPA are the key predictors of GPA. None of the controls are statistically significant. Multivariate Regression But look at how far off we could have been… We could have found that Greeks were associated with a lower GPA. Or that living on campus was associated with a higher GPA. Or that Pell recipients were associated with a lower GPA. Or that Asian students had a higher GPA than White students. Multivariate Regression But what we found was that SAT score and HSGPA were the only aspects that were associated with a change in GPA. After controlling for SAT and HSGPA, nothing else significantly impacts GPA. Multivariate Regression - Example 𝑦 = GPA at Graduation Key variables of interest Control variables This is an easy way to summarize significance levels. The more stars, the greater the confidence level. Appendix 2a. Results of OLS Regressions, Graduating GPA as Dependent Variable (Students Entering Fall 2007) UROP 0.12*** Study Abroad 0.06** COOP 0.13*** Internship 0.11*** Minor 0.12*** International Plan 0.20 Greek -0.04 NCAA Athlete 0.26*** Length Lived On-Campus -0.01 Pell Recipient -0.04 SAT Math 0.13*** SAT Verbal 0.05*** High School GPA 0.73*** Asian 0.01 Black -0.11* Hispanic -0.07 Other -0.01 International 0.04 White (comparison group) --Male -0.02 GA Resident -0.01 Intercept -0.84*** N 2125 Adjusted R-Squared 0.27 * p < 0.05 ** p < 0.01 *** p < 0.001 Actual R2 values will be more in this range, not 0.9 with the constructed data Multivariate Regression - Example Appendix 2a. Results of OLS Regressions, Graduating GPA as Dependent Variable (Students Entering Fall 2007) UROP 0.12*** Study Abroad 0.06** COOP 0.13*** Internship 0.11*** Minor 0.12*** International Plan 0.20 Greek -0.04 NCAA Athlete 0.26*** Length Lived On-Campus -0.01 Pell Recipient -0.04 SAT Math 0.13*** SAT Verbal 0.05*** High School GPA 0.73*** Asian 0.01 Black -0.11* Hispanic -0.07 Other -0.01 International 0.04 White (comparison group) --Male -0.02 GA Resident -0.01 Intercept -0.84*** N 2125 Adjusted R-Squared 0.27 * p < 0.05 ** p < 0.01 *** p < 0.001 Participation in the undergraduate research program associated with 0.12 higher GPA at graduation. What’s potentially wrong with this? GPA is a requirement for participation in the program. Multivariate Regression - Example Appendix 2a. Results of OLS Regressions, Graduating GPA as Dependent Variable (Students Entering Fall 2007) UROP 0.12*** Study Abroad 0.06** COOP 0.13*** Internship 0.11*** Minor 0.12*** International Plan 0.20 Greek -0.04 NCAA Athlete 0.26*** Length Lived On-Campus -0.01 Pell Recipient -0.04 SAT Math 0.13*** SAT Verbal 0.05*** High School GPA 0.73*** Asian 0.01 Black -0.11* Hispanic -0.07 Other -0.01 International 0.04 White (comparison group) --Male -0.02 GA Resident -0.01 Intercept -0.84*** N 2125 Adjusted R-Squared 0.27 * p < 0.05 ** p < 0.01 *** p < 0.001 Participation in the international plan had no significant relationship with GPA at graduation. Multivariate Regression - Example Appendix 2a. Results of OLS Regressions, Graduating GPA as Dependent Variable (Students Entering Fall 2007) UROP 0.12*** Study Abroad 0.06** COOP 0.13*** Internship 0.11*** Minor 0.12*** International Plan 0.20 Greek -0.04 NCAA Athlete 0.26*** Length Lived On-Campus -0.01 Pell Recipient -0.04 SAT Math 0.13*** SAT Verbal 0.05*** High School GPA 0.73*** Asian 0.01 Black -0.11* Hispanic -0.07 Other -0.01 International 0.04 White (comparison group) --Male -0.02 GA Resident -0.01 Intercept -0.84*** N 2125 Adjusted R-Squared 0.27 * p < 0.05 ** p < 0.01 *** p < 0.001 Not surprising to find that academic preparation was strongly associated with GPA. Multivariate Regression - Example Appendix 2a. Results of OLS Regressions, Graduating GPA as Dependent Variable (Students Entering Fall 2007) This is a helpful way to identify the comparison group for categorical variables. UROP 0.12*** Study Abroad 0.06** COOP 0.13*** Internship 0.11*** Minor 0.12*** International Plan 0.20 Greek -0.04 NCAA Athlete 0.26*** Length Lived On-Campus -0.01 Pell Recipient -0.04 SAT Math 0.13*** SAT Verbal 0.05*** High School GPA 0.73*** Asian 0.01 Black -0.11* Hispanic -0.07 Other -0.01 International 0.04 White (comparison group) --Male -0.02 GA Resident -0.01 Intercept -0.84*** N 2125 Adjusted R-Squared 0.27 * p < 0.05 ** p < 0.01 *** p < 0.001 Black students associated with lower GPA at graduation. Unfortunately, this trend is prominent in the literature. Multivariate Regression - Example Appendix 2a. Results of OLS Regressions, Graduating GPA as Dependent Variable (Students Entering Fall 2007) UROP 0.12*** Study Abroad 0.06** COOP 0.13*** Internship 0.11*** Minor 0.12*** International Plan 0.20 Greek -0.04 NCAA Athlete 0.26*** Length Lived On-Campus -0.01 Pell Recipient -0.04 SAT Math (100s) 0.13*** SAT Verbal (100s) 0.05*** High School GPA 0.73*** Asian 0.01 Black -0.11* Hispanic -0.07 Other -0.01 International 0.04 White (comparison group) --Male -0.02 GA Resident -0.01 Intercept -0.84*** N 2125 Adjusted R-Squared 0.27 * p < 0.05 ** p < 0.01 *** p < 0.001 Athletes associated with a higher GPA at graduation. Why might this be? 1. 2. 3. 4. 5. Only athletes who graduate. Small sample size. Additional tutoring. All sports. Other ideas? Multivariate Regression - Exercise Open the mreg tab in the Exercises workbook and estimate models for end of course performance in PHYSICS 2102. Which variables did you include in your model? Which variables are statistically significant? What is the magnitude of the variables? Which variable has the largest coefficient? When finished, take a short break. Multivariate Regression - Exercise Multivariate Regression - Exercise Can’t include every combination of a categorical variable. Sets “White” as the comparison group, which is why everything is 0. Multivariate Regression - Exercise Model is strong. Model fit is strong. Multivariate Regression - Exercise Performance in PHYS 2101 is positively associated with performance in PHYS 2102. A 1 point increase in your PHYS 2101 grade is associated with a 0.65 point increase in your PHYS 2102 grade. Multivariate Regression - Exercise High School GPA is positively associated with performance in PHYS 2102. A 1 point increase in your High School GPA is associated with an 11.19 point increase in your PHYS 2102 grade. Multivariate Regression - Exercise Black students perform worse in PHYS 2102 than White students. Black students are associated with a 3.47 point lower score in PHYS 2102 than White students. Multivariate Regression - Exercise Hispanic students perform better in PHYS 2102 than White students. Hispanic students are associated with a 4.67 point higher score in PHYS 2102 than White students. Can’t say anything from this model about how well Hispanic students do when compared to Black students. Multivariate Regression - Exercise Male students perform better in PHYS 2102 than Female students. Male students are associated with a 4.25 point higher score in PHYS 2102 than Female students. Multivariate Regression - Exercise Tall students are negatively associated with performance in PHYS 2102. A 1 inch increase in your height is associated with a 0.51 point decrease in your PHYS 2102 grade. Multivariate Regression - Exercise This makes no sense! Why would height be correlated with Physics scores? It’s not. I created this variable to be correlated with gender. We should drop it. Multivariate Regression - Exercise Significant Variables in Full Model PHYS 2101 Performance (0.65) High School GPA (11.19) Black Students (-3.47) Hispanic Students (4.67) Male Students (4.25) Height (-0.51) Multivariate Regression - Exercise Performance in PHYS 2101 is positively associated with performance in PHYS 2102. A 1 point increase in your PHYS 2101 grade is associated with a 0.62 point increase in your PHYS 2102 grade. Multivariate Regression - Exercise High School GPA is positively associated with performance in PHYS 2102. A 1 point increase in your High School GPA is associated with an 10.84 point increase in your PHYS 2102 grade. Multivariate Regression - Exercise Asian students perform better in PHYS 2102 than White students (just over α = 0.05). Asian students are associated with a 3.06 point higher score in PHYS 2102 than White students. Multivariate Regression - Exercise Black students perform worse in PHYS 2102 than White students. Black students are associated with a 3.50 point lower score in PHYS 2102 than White students. Multivariate Regression - Exercise Hispanic students perform better in PHYS 2102 than White students. Hispanic students are associated with a 3.76 point higher score in PHYS 2102 than White students. Multivariate Regression - Exercise FULL MODEL MODEL EXCLUDING HEIGHT PHYS 2101 Performance (0.65) PHYS 2101 Performance (0.62) High School GPA (11.19) High School GPA (10.84) Black Students (-3.47) Black Students (-3.50) Hispanic Students (4.67) Hispanic Students (3.76) Male Students (4.25) Male Students (NS) Height (-0.51) Multivariate Regression - Exercise So what’s the right answer? I built the data with correlations between PHYS 2102 and: PHYS 2101 SAT & HSGPA (which were correlated) Race/Ethnicity Gender Tutoring was randomly generated Multivariate Regression - Exercise Why was SAT not significant? SAT was not significant because it’s highly correlated with HSGPA and that’s what’s capturing the relationship Why was gender not significant? Because it’s being captured by other variables such as HSGPA and race/ethnicity. Why didn’t we include height if it was significant in the full model? It has nothing to do with PHSY 2102, it reflects a correlation to gender. Multivariate Regression - Exercise Why did we include tutoring if it’s not significant? You could drop it, but it’s theoretically significant to the model. Not much difference in coefficients. No relationship with PHYS 2102 nor many other variables. Want to understand if tutoring was associated with performance in PHYS 2102. Multivariate Regression - Exercise Regression results hold all other variables constant, isolating the impact of only the specific variable. The only thing you are changing is that variable of interest. Similar to comparing a male to a female with the exact same HSGPA, SAT, tutoring, race/ethnicity, etc. Multivariate Regression - Exercise So what’s the “right” answer? But you won’t know this in the real world. The data won’t be constructed. Use theory, previous literature, and common sense to develop your models. (Don’t include height as a predictor of performance in PHYS 2102) Multivariate Regression - Exercise So what’s the “right” answer? But you won’t know this in the real world. The data won’t be constructed. Use theory, previous literature, and common sense to develop your models. It’s easy to find correlations, but they should be justified. (Morning gym attendance might be correlated with a higher GPA, but it’s due to high motivation, not workout behavior. Don’t force students to go to the gym at 5:30am to try to boost GPA’s.) Logistic Regression Basic Correlation Looks at the relationship between multiple independent variables and the dependent variable Pr 𝑦 = 1 𝑋) = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … y is the binary dependent variable (1 = yes 0 = no) Logistic Regression β will yield the increase or decrease in the probability of 𝑦 occurring A 1 unit increase in 𝑋1 is associated with a 20% increase in the probability of 𝑦 𝑦 is now a predicted probability between 0 and 1 White students with a HSGPA of 3.5 and SAT score of 1450 have a probability of 0.71 of graduating within 6 years Coefficients are difficult to interpret (they are expressed in odds ratios) Resort to more likely or less likely without a magnitude Not much difference between logistic regression and linear probability estimation when looking around the mean If coefficients don’t vary between the models, use linear probability for simplicity Logistic Regression Let’s brainstorm some examples where you’d need to use logistic regression. Retention Pass / Fail AA Graduation in 2 years Grade AA Graduation in 3 years Yield Rate BA Graduation in 4 years Major BA Graduation in 6 years FT / PT Logistic Regression 79% retention rate 69% in-state students 53% female Linear Probability Model Linear Probability Model A 100 point increase in SAT scores are associated with an increase in the probability of retaining by 13%. In-state students are 26% more likely than out-of-state students to be retained. Linear Probability Model An out-of-state, Asian, male student with a SAT score of 1250 and HS GPA of 3.0 would have what probability of being retained? Pr 𝑦 = 1 𝑋) = −1.56 + 0.00132 1250 + 0.132 3.0 + 0.259 0 − 0.048 1 − 0.026 0 − 0.006 0 + 0.064 0 − 0.045 (1) Pr 𝑦 = 1 𝑋) = 0.39 Linear Probability Model An in-state, white, female student with a SAT score of 1500 and HS GPA of 3.9 would have what probability of being retained? Pr 𝑦 = 1 𝑋) = −1.56 + 0.00132 1500 + 0.132 3.9 + 0.259 1 − 0.048 0 − 0.026 0 − 0.006 0 + 0.064 0 − 0.045 (0) Pr 𝑦 = 1 𝑋) = 1.19 Linear Probability Model Pr 𝑦 = 1 𝑋) = 1.19 Doesn’t make sense. Can’t have a probability of over 1. Linear Probability Models fail when looking at the extremes (high SAT & GPA). Logistic Regression The problem with logistic regression is that the coefficients are difficult to interpret. From: Pr 𝑦 = 1 𝑋) = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … To: Pr 𝑦 = 1 𝑋) = 1 1 + 𝑒 −(β0+ β1 𝑋1 + β2 𝑋2+ β3 𝑋3+ …) Or, more specifically: 1 Pr 𝑌𝑖 = 𝑦𝑖 𝑋𝑖 ) = 1 + 𝑒 −β 𝑋𝑖 𝑦𝑖 1 1 1 + 𝑒 −β 𝑋𝑖 1−𝑦𝑖 Logistic Regression There are two ways of interpreting logistic regression. Maximum Likelihood Odds Ratios Logistic Regression Maximum Likelihood If positive, more likely Higher SAT associated with higher probability of retaining Higher HS GPA associated with higher probability of retaining If negative, less likely Asian students less likely than Whites to be retained Males less likely than females to be retained Magnitudes are exponential estimates Can’t say much about how 𝑋 affects 𝑦 Non-linear. No constant β. Have to identify values. White, in-state, female with 1500 and 3.9 Pr 𝑦 = 1 𝑋) = 𝑒 −19.75+0.015 1500 +0.405 3.9 +2.25 (1) 𝑒 −19.75+0.015 1500 +0.405 3.9 +2.25 (1) +1 = 0.99 Logistic Regression Maximum Likelihood If positive, more likely Higher SAT associated with higher probability of retaining Higher HS GPA associated with higher probability of retaining If negative, less likely Asian students less likely than Whites to be retained Males less likely than females to be retained Magnitudes are exponential estimates Can’t say much about how 𝑋 affects 𝑦 Non-linear. No constant β. Have to identify values. Asian, out-of-state, male with 1250 and 3.0 Pr 𝑦 = 1 𝑋) = 𝑒 −19.75+0.015 1250 +0.405 3.0 −0.78 1 −1.11 (1) 𝑒 −19.75+0.015 1250 +0.405 3.0 −0.78 1 −1.11 (1) +1 = 0.19 Logistic Regression Odds Ratios If >1, more likely Higher SAT associated with higher probability of retaining Higher HS GPA associated with higher probability of retaining If <1, less likely Asian students less likely than Whites to be retained Males less likely than females to be retained Direction of the association is the same Non-linear. No constant β. Have to identify values. Linear Probability - Example 𝑦 = Pr(Graduating) Key variables of interest Control variables Appendix 4a. Results of Linear Probability, Graduation within 6 Years as Dependent Variable (Students Entering Fall 2007) UROP Study Abroad COOP Internship Minor International Plan Greek NCAA Athlete Length Lived On-Campus Pell Recipient SAT Math SAT Verbal High School GPA Asian Black Hispanic Other International White (comparison group) Male GA Resident Intercept N Adjusted R-Squared * p < 0.05 ** p < 0.01 *** p < 0.001 0.14*** 0.10*** 0.11*** 0.12*** 0.11*** 0.06 0.10*** 0.07 0.08*** 0.02 0.03* -0.02* 0.17*** 0.01 -0.04 -0.08* -0.08 -0.02 ---0.01 0.07*** -0.21 2611 0.21 GPA is still a requirement for participation many of these programs, which is correlated to probability of graduating. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Undergraduate research and study abroad associated with higher probabilities of graduating. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Students who participate in a co-op program were less likely to graduate within 4 years but more likely to graduate within 6 years. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Students who participate in an internship or who declare a minor were more likely to graduate within 6 years, but had no impact on graduation within 4 years. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 As with GPA, the international plan program had no impact on probability of graduation. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Students who participate in Greek activities were more likely to graduate within 6 years. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Student athletes were more likely to graduate within 4 years, but no more likely to graduate within 6 years. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 The longer a student lives on campus, the more likely they are to graduate within 4 or 6 years. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Pell recipients and men less likely to graduate within 4 years. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Students with strong academic preparation more likely to graduate within 4 or 6 years. Though those with high SAT Verbal scores slightly less likely to graduate within 6 years? Tech. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 Hispanic students less likely to graduate within 6 years when compared to White students. Linear Probability - Example Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007), Graduation within 4 Years as Dependent Variable Graduation within 6 Years as Dependent Variable UROP 0.13*** UROP 0.14*** Study Abroad 0.06** Study Abroad 0.10*** COOP -0.27*** COOP 0.11*** Internship -0.00 Internship 0.12*** Minor 0.04 Minor 0.11*** International Plan -0.06 International Plan 0.06 Greek -0.00 Greek 0.10*** NCAA Athlete 0.25*** NCAA Athlete 0.07 Length Lived On-Campus 0.04*** Length Lived On-Campus 0.08*** Pell Recipient -0.07** Pell Recipient 0.02 SAT Math 0.05** SAT Math 0.03* SAT Verbal 0.01 SAT Verbal -0.02* High School GPA 0.38*** High School GPA 0.17*** Asian 0.05 Asian 0.01 Black -0.02 Black -0.04 Hispanic -0.07 Hispanic -0.08* Other 0.03 Other -0.08 International 0.05 International -0.02 White (comparison group) --White (comparison group) --Male -0.08*** Male -0.01 GA Resident -0.01 GA Resident 0.07*** Intercept -1.53*** Intercept -0.21 N 2611 N 2611 Adjusted R-Squared 0.15 Adjusted R-Squared 0.21 * p < 0.05 ** p < 0.01 *** p < 0.001 * p < 0.05 ** p < 0.01 *** p < 0.001 In-state students more likely to graduate within 6 years. Linear Probability - Example Takeaways Most programs are successful in increasing probabilities of graduation Co-Op delays graduation (1 year program) International Plan has no impact on graduation Requirements to be admitted to the programs suggest that these bright students would likely have graduated on time anyhow Academic preparation, engagement activities, and student characteristics are all associated with the probability of graduating on time Logistic Regression - Exercise Open the lot tab in the Exercises workbook and estimate models for graduation in 4 years using both the linear probability model and the logistic regression model. Interpret the coefficients and statistical significance in the linear probability model. What is the probability of graduating within 4 years in the linear probability model for a student with an Academic Integration Index and Social Integration Index at their respective means? In the logistic regression model, are students with a greater Academic Integration Index score more or less likely to graduate within 4 years? In the logistic regression model, are students with a greater Social Integration Index score more or less likely to graduate within 4 years? What is the probability of graduating within 4 years in the logistic regression model for a student with an Academic Integration Index and Social Integration Index at their respective means? Logistic Regression - Exercise Academic Integration Index mean is 34.24 Social Integration Index mean is 24.83. Logistic Regression - Exercise A 1 unit increase in the Academic Integration Index is associated with a 0.7% increase in the probability of graduating within 4 years, holding all else constant. A 1 unit increase in the Social Integration Index is associated with a 1.2% increase in the probability of graduating within 4 years, holding all else constant. Logistic Regression - Exercise A student with an Academic Integration Index score of 34.24 and Social Integration Index score of 24.83 would be predicted to have a 0.40 probability of graduating within 4 years. The linear regression at the means of the 𝑋 variables give you the mean of the 𝑦 variable. Logistic Regression - Exercise A higher Academic Integration Index is associated with a greater probability of graduating within 4 years, holding all else constant. A higher Social Integration Index is associated with a greater probability of graduating within 4 years, holding all else constant. Logistic Regression - Exercise A student with an Academic Integration Index score of 34.24 and Social Integration Index score of 24.83 would be predicted to have a 0.85 probability of graduating within 4 years. This is much higher than the 0.4 with the linear regression. Those at or above the mean much more likely to graduate within 4 years than those below the mean. Break Propensity Score Matching TREATMENT GROUP CONTROL GROUP Propensity Score Matching TREATMENT GROUP CONTROL GROUP TREATMENT EFFECT = 82 = 79 3 = 86 = 82 4 = 85 = 81 4 = 82 = 80 2 = 81 = 79 2 = 80 = 78 2 2.83 Propensity Score Matching The question with matching is how you determine a good match. Develop propensity scores (weighted values using logistic regression on a series of variables) Match those with close propensity scores 1-to-1 match with replacement 1-to-1 match without replacement Multiple matches Many more varieties Compare matches on outcome OR use matches in multiple regression Propensity Score Matching Multivariate Regression GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS Propensity Scores Pr(TREATMENT) = β0 + β1 ACADEMIC_PREP + β2 CONTROLS Where the 𝑋 variables are characteristics and variables associated with whether they chose to participate in the treatment The 𝑋 variables cannot be associated with the eventual outcome, only with the treatment Develops Pr(TREATMENT) for each individual based on this regression and their characteristics Propensity Score Matching Pr(TREATMENT) = β0 + β1 ACADEMIC_PREP + β2 CONTROLS The Pr(TREATMENT) for each individual is their propensity score – the propensity that they participated in the treatment Use this score to find a match to someone with a similar propensity, but who didn’t participate in the treatment Not everyone will have a match. Analysis limited to students who overlap. If everyone with a SAT score over 1500 participated, there would be no students left to compare against who didn’t participate. Only matches based on observed characteristics. Omitted variable bias still possible in developing matches. Propensity Score Matching Matching Using the propensity scores (Pr(TREATMENT)) Nearest Neighbor – find closest match TREATMENT GROUP 0.599794 0.726072 0.827099 0.724683 0.551767 0.029345 0.279154 0.686264 0.006545 0.946441 CONTROL GROUP 0.325386 0.585769 0.543578 0.076511 0.027491 0.922861 0.861053 0.995708 0.611816 0.590846 Propensity Score Matching Matching Using the propensity scores (Pr(TREATMENT)) Nearest Neighbor – find closest match Caliper – find closest match within a given band Caliper <= 0.01 The rest have no overlap (no matches) TREATMENT GROUP 0.599794 0.726072 0.827099 0.724683 0.551767 0.029345 0.279154 0.686264 0.006545 0.946441 CONTROL GROUP 0.325386 0.585769 0.543578 0.076511 0.027491 0.922861 0.861053 0.995708 0.611816 0.590846 Propensity Score Matching Matching TREATMENT GROUP 0.599794 Using the propensity scores (Pr(TREATMENT)) 0.726072 Nearest Neighbor – find closest match 0.827099 Caliper – find closest match within a given band 0.724683 0.551767 Replacement 0.029345 With replacement – multiple treatment group individuals 0.279154 can be matched to the same control group individual 0.686264 0.006545 0.946441 CONTROL GROUP 0.325386 0.585769 0.543578 0.076511 0.027491 0.922861 0.861053 0.995708 0.611816 0.590846 Propensity Score Matching Matching TREATMENT GROUP 0.599794 Using the propensity scores (Pr(TREATMENT)) 0.726072 Nearest Neighbor – find closest match 0.827099 Caliper – find closest match within a given band 0.724683 0.551767 Replacement 0.029345 With replacement – multiple treatment group individuals 0.279154 can be matched to the same control group individual 0.686264 Without replacement – control group individual is matched 0.006545 to single treatment individual 0.946441 Closest match – match closest treatment and control group scores CONTROL GROUP 0.325386 0.585769 0.543578 0.076511 0.027491 0.922861 0.861053 0.995708 0.611816 0.590846 Propensity Score Matching Matching TREATMENT GROUP 0.006545 Using the propensity scores (Pr(TREATMENT)) 0.029345 Nearest Neighbor – find closest match 0.279154 Caliper – find closest match within a given band 0.551767 0.599794 Replacement 0.686264 With replacement – multiple treatment group individuals 0.724683 can be matched to the same control group individual 0.726072 Without replacement – control group individual is matched 0.827099 to single treatment individual 0.946441 Closest match – match closest treatment and control group scores First match – treatment scores are sorted and matched 1-to-1 to a control individual CONTROL GROUP 0.027491 0.076511 0.325386 0.543578 0.585769 0.590846 0.611816 0.861053 0.922861 0.995708 Propensity Score Matching Matching Using the propensity scores (Pr(TREATMENT)) Nearest Neighbor – find closest match Caliper – find closest match within a given band Replacement With replacement – multiple treatment group individuals can be matched to the same control group individual Without replacement – control group individual is matched to single treatment individual Closest match – match closest treatment and control group scores First match – treatment scores are sorted and matched 1-to-1 to a control individual My preferred model. Ensures close, accurate matches. Eliminates individuals with no match / poor matches. But make sure one control observation does not dominate the matches. Propensity Score Matching Those who participated in a test prep program had an SAT score that was roughly 400 points higher than those who did not. Propensity Score Matching But look at how much higher the GPA was for students who participated in the test prep program, maybe this is what was causing the difference in SAT scores. Propensity Score Matching Could only find 7 matches out of 150 people. This means that the treatment and control group differed a lot. Lack of overlap. Propensity Score Matching For these 7 matches, there was only a 191 point difference in SAT scores. Much different than the 400 point difference observed earlier. Propensity Score Matching This time we have the opposite. There is a 294 point difference, but the student demographics are fairly equivalent, including GPA. This means there should be a lot more matches. Propensity Score Matching Not only did 40/40 = 100% of the treatment population have matches, But there was a significant difference of 328.8. This is 34 points higher (12%) than just comparing participants to non-participants. Propensity Score Matching - Example Probability of Graduating within 6 Years (Linear Probability) GPA at Graduation (Multivariate Regression) UROP Study Abroad COOP Internship Minor International Plan Greek NCAA Athlete Length Lived On-Campus Pell Recipient SAT Math SAT Verbal High School GPA Asian Black Hispanic Other International White (comparison group) Male GA Resident Intercept N Adjusted R-Squared * p < 0.05 ** p < 0.01 *** p < 0.001 0.12*** 0.06** 0.13*** 0.11*** 0.12*** 0.20 -0.04 0.26*** -0.01 -0.04 0.13*** 0.05*** 0.73*** 0.01 -0.11* -0.07 -0.01 0.04 ---0.02 -0.01 -0.84*** 2125 0.27 Both models could be biased because there are requirements to be admitted to these programs. UROP Study Abroad COOP Internship Minor International Plan Greek NCAA Athlete Length Lived On-Campus Pell Recipient SAT Math SAT Verbal High School GPA Asian Black Hispanic Other International White (comparison group) Male GA Resident Intercept N Adjusted R-Squared * p < 0.05 ** p < 0.01 *** p < 0.001 0.14*** 0.10*** 0.11*** 0.12*** 0.11*** 0.06 0.10*** 0.07 0.08*** 0.02 0.03* -0.02* 0.17*** 0.01 -0.04 -0.08* -0.08 -0.02 ---0.01 0.07*** -0.21 2611 0.21 Propensity Score Matching - Example Appendix 1b. Logistic Regressions to Develop Propensity Scores UROP Study Abroad COOP Greek -0.15 0.8 0.30 NCAA Athlete -1.56 -3.23 Length Lived On-Campus 0.17 0.16 Pell Recipient 0.45 -0.30 SAT Math 0.37 0.22 0.26 SAT Verbal 0.23 0.14 -0.14 High School GPA 0.67 0.53 1.26 Asian 0.37 -0.16 -0.28 Black -0.58 Hispanic 0.38 0.33 Other 0.46 International 0.54 White (comparison group) Male -0.41 -0.56 0.46 GA Resident -0.37 -0.30 ROC 0.66 0.67 0.64 GOF 0.66 0.45 0.74 Note: Only coefficients significant at the p < 0.001 level are included. Internship 0.49 0.20 0.64 0.31 0.78 0.27 0.51 0.53 -0.22 0.64 0.16 Minor Int’l Plan 0.13 0.17 0.42 0.37 -0.40 0.23 0.64 0.07 0.71 0.70 0.41 These programs are now the dependent variable. We’re estimating probabilities of participating in these programs given student characteristics. Propensity Score Matching - Example Appendix 1b. Logistic Regressions to Develop Propensity Scores UROP Study Abroad COOP Greek -0.15 0.8 0.30 NCAA Athlete -1.56 -3.23 Length Lived On-Campus 0.17 0.16 Pell Recipient 0.45 -0.30 SAT Math 0.37 0.22 0.26 SAT Verbal 0.23 0.14 -0.14 High School GPA 0.67 0.53 1.26 Asian 0.37 -0.16 -0.28 Black -0.58 Hispanic 0.38 0.33 Other 0.46 International 0.54 White (comparison group) Male -0.41 -0.56 0.46 GA Resident -0.37 -0.30 ROC 0.66 0.67 0.64 GOF 0.66 0.45 0.74 Note: Only coefficients significant at the p < 0.001 level are included. Internship 0.49 0.20 0.64 0.31 0.78 0.27 0.51 0.53 -0.22 0.64 0.16 Minor Int’l Plan 0.13 0.17 0.42 0.37 -0.40 0.23 0.64 0.07 0.71 0.70 0.41 Greeks very involved. Athletes don’t have time to study abroad. Academic preparation linked to most programs. Etc. Propensity Score Matching - Example Appendix 1b. Logistic Regressions to Develop Propensity Scores UROP Study Abroad COOP Greek -0.15 0.8 0.30 NCAA Athlete -1.56 -3.23 Length Lived On-Campus 0.17 0.16 Pell Recipient 0.45 -0.30 SAT Math 0.37 0.22 0.26 SAT Verbal 0.23 0.14 -0.14 High School GPA 0.67 0.53 1.26 Asian 0.37 -0.16 -0.28 Black -0.58 Hispanic 0.38 0.33 Other 0.46 International 0.54 White (comparison group) Male -0.41 -0.56 0.46 GA Resident -0.37 -0.30 ROC 0.66 0.67 0.64 GOF 0.66 0.45 0.74 Note: Only coefficients significant at the p < 0.001 level are included. Internship 0.49 0.20 0.64 0.31 0.78 0.27 0.51 0.53 -0.22 0.64 0.16 Minor Int’l Plan 0.13 0.17 0.42 0.37 -0.40 0.23 0.64 0.07 0.71 0.70 0.41 Receiver Operating Characteristic Curve (ROC) and Goodness of Fit (GOF) are estimates of model fit. Propensity Score Matching - Example Appendix 1b. Logistic Regressions to Develop Propensity Scores UROP Study Abroad COOP Greek -0.15 0.8 0.30 NCAA Athlete -1.56 -3.23 Length Lived On-Campus 0.17 0.16 Pell Recipient 0.45 -0.30 SAT Math 0.37 0.22 0.26 SAT Verbal 0.23 0.14 -0.14 High School GPA 0.67 0.53 1.26 Asian 0.37 -0.16 -0.28 Black -0.58 Hispanic 0.38 0.33 Other 0.46 International 0.54 White (comparison group) Male -0.41 -0.56 0.46 GA Resident -0.37 -0.30 ROC 0.66 0.67 0.64 GOF 0.66 0.45 0.74 Note: Only coefficients significant at the p < 0.001 level are included. Internship 0.49 0.20 0.64 0.31 0.78 0.27 0.51 0.53 -0.22 0.64 0.16 Minor Int’l Plan 0.13 0.17 0.42 0.37 -0.40 0.23 0.64 0.07 These scores are really low. Suggests the models are not strong and matches will likely be poor. 0.71 0.70 0.41 Propensity Score Matching - Example Tables 2, 3. Probabilities of Participation UROP Greek NCAA Athlete Length Lived On-Campus Pell Recipient SAT Math SAT Verbal High School GPA Asian Black Hispanic Other International White (comparison group) Male GA Resident ROC GOF Study Abroad ↓ --↑ ↑ ↑ ↑ ↑ ↑ --↑ ↑ ↑ ↑ ↓ ↑ ↓ ↑ ↑ ↑ ↓ ↓ ↑ ----- ↓ --0.66 0.66 ↓ ↓ 0.67 0.45 Another way to present probabilities. COOP ↑ ↓ ----↑ ↓ ↑ ↓ --------- Internship ↑ --↑ ↑ ↑ --↑ ↑ ↑ ↑ ----- (Comparison Group) ↑ ↓ ↓ --0.64 0.64 0.74 0.16 Minor Int’l Plan ----↑ --↑ ↑ ↑ ----------- ----------↑ ------------- ↓ ↑ 0.64 0.07 ----0.70 0.41 Propensity Score Matching - Example Each individual’s characteristics are then applied to this regression to develop propensity scores for the probability of their participation These propensity scores are then matched to non-participants Propensity Score Matching - Example Appendix 1a. Results of Propensity Score Matching UROP Study Abroad COOP UROP Study Abroad COOP GPA 0.16*** 0.14*** 0.12*** 4-Year Grad 0.11*** 0.07*** -0.24*** Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2939 3319 3091 Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2684 4239 4125 6-Year Grad Treated UROP 0.14*** 4045 Study Abroad 0.19*** 4725 COOP 0.17*** 4351 Note: Models for Internships, Minors, and the International Plan Control Matches 17,650 3684 16,970 4239 17,344 4125 fail measures of model fit. Now, when matching participants to non-participants, we see that UROP associated with higher GPAs and probability of graduating within 4 or 6 years. Study Abroad associated with higher GPAs and probability of graduating within 4 or 6 years. COOP associated with higher GPAs probability of graduating within 6 years. But lower probability of graduating within 4 years. Propensity Score Matching - Example Summarized Results GPA UROP Study Abroad COOP Regression 0.12*** 0.06** 0.13*** Linear Probability PSM 0.16*** 0.14*** 0.12*** 0.13*** 0.06** -0.27*** 0.11*** 0.07*** -0.24*** 4-Year Grad UROP Study Abroad COOP 6-Year Grad UROP Study Abroad COOP Note: Models for Internships, Minors, and measures of model fit. 0.14*** 0.14*** 0.10*** 0.19*** 0.11*** 0.17*** the International Plan fail The coefficients are really close between the different models. That’s a good sign that the estimates are relatively accurate. If they differed greatly, there could be bias in the regressions or poor matches. Time Series Analysis Regression, Logistic Regression, and Propensity Score Matching all assume there is no time trend Participants and Non-Participants must be from the same cohort Only one time period in the data Time series analysis uses a type of regression to look for changes over time Time must be an independent variable Time series analysis is also known as longitudinal analysis Tracking the same individual over time is called panel analysis The type of panel analysis I’m going to teach is called Fixed Effects Time Series Analysis Multiple Regression 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … + ε Time Series Analysis 𝑦 = β0 + β1 𝑋1𝑖𝑡 + β2 𝑋2𝑖𝑡 + β3 𝑋3𝑖𝑡 + … + α𝑖 + δ𝑡 + ε𝑖𝑡 ε is an error component α controls for individuals δ controls for time 𝑖 is the identifier per individual t is the identifier per time period Time Series Analysis Multiple Regression 𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … + ε Time Series Analysis 𝑦 = β0 + β1 𝑋1𝑖𝑡 + β2 𝑋2𝑖𝑡 + β3 𝑋3𝑖𝑡 + … + α𝑖 + δ𝑡 + ε𝑖𝑡 Remember how categorical variables were transformed into binary variables? We’re essentially doing the same for each time and individual. Year = 1999 Year1997 = 0 Year1998 = 0 Year1999 = 1 ID = 103 ID102 = 0 ID103 = 1 ID104 = 0 … Year2000 = 0 … Time Series Analysis Time Series Analysis 𝑦 = β0 + β1 𝑋1𝑖𝑡 + β2 𝑋2𝑖𝑡 + β3 𝑋3𝑖𝑡 + … + α𝑖 + δ𝑡 + ε𝑖𝑡 In essence, this looks for changes over time for a given individual Things that do not vary over time get dropped from the analysis A $100,000 increase in spending on student services is associated with a 2 percentage point increase in graduation rates Time Series Analysis Basic Time Series 25 20 15 10 5 0 2003 2005 2007 2009 2011 2013 2015 Time Series Analysis Basic Time Series 25 20 15 y = 1.1225x - 2244.7 R² = 0.3691 10 5 0 2003 2005 2007 2009 2011 2013 2015 Time Series Analysis Time Series with Controls for Individual 25 y = 4x - 8035 y = 3.5x - 7008.5 20 y = 3x - 6022 15 y = 2x - 4006 10 y = 2x - 4012 5 0 2003 2005 2007 2009 2011 2013 2015 Time Series Analysis - Example Each column is associated with a different dependent variable (expenditures). Time Series Analysis - Example Within an institution, a $100 increase in state appropriations per FTE is associated with a $26 increase in instruction after controlling for other sources of revenues and time trends. Time Series Analysis - Example More realistically, a $100 DECREASE in state appropriations per FTE is associated with a $26 DECREASE in instructional expenses. Time Series Analysis - Example Not surprisingly, a strong link between tuition revenue and instructional expenses. Time Series Analysis - Example A similarly strong link between grants and contracts with research expenses. Time Series Analysis - Example Virtually no relationship between spending and retention rates. Time Series Analysis - Example Few relationships between spending and graduation rates (and marginal significance). Time Series Analysis - Example Increasing expenses for scholarships and fellowships reduced 4-year graduation rates. Time Series Analysis - Example Increasing tuition increases the 6-year graduation rate. Time Series Analysis - Example Increasing tuition or reducing scholarships/fellowships may motivate students to graduate faster so they don’t have to pay more. But probably not a good policy idea. Time Series Analysis - Example Takeaways Time Series Analysis is very similar to OLS regression But it goes another step to control for time trends And it looks at changes within a unit of analysis over time This helps to move from a simple correlation (E.g. Institutions with large enrollments are associated with larger numbers of administrators) Time Series Analysis - Example Takeaways Time Series Analysis is very similar to OLS regression But it goes another step to control for time trends And it looks at changes within a unit of analysis over time This helps to move from a simple correlation To a more robust analysis of changes for a unit over time (E.g. As enrollment increases by 100 students for an institution, staff/administration is expected to increase by 2) Time Series Analysis - Example Takeaways Time Series Analysis is very similar to OLS regression But it goes another step to control for time trends And it looks at changes within a unit of analysis over time This helps to move from a simple correlation To a more robust analysis of changes for a unit over time This helps control for cross institutional differences (E.g. Looking at changing enrollment patterns for an institution rather than comparing enrollment patterns from institutions of different types or populations) Break Interpretation and Application Interpretation and Application What are some examples of findings we’ve discovered? Correlational A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables. A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant. A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other variables. Quasi-Experimental Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6 years when compared to non-participants. Those who participate in study abroad were associated with higher earning at graduation as compared to non-participants. Interpretation and Application Which technique matches to each of these? (Regression, Logistic Regression, Propensity Score Matching, or Time Series) Correlational A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables. REGRESSION. A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant. TIME SERIES. A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other variables. LOGISTIC REGRESSION. Quasi-Experimental Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6 years when compared to non-participants. PSM. Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants. PSM. Interpretation and Application What are some key phrases to use and understand in the interpretation and application? “Is Associated” Correlational A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables. A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant. A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other variables. Quasi-Experimental Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6 years when compared to non-participants. Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants. We can’t say anything about causation, so we can’t use causal language. We can’t say X caused Y. Instead, say that changing X is associated with a change in Y. Interpretation and Application What are some key phrases to use and understand in the interpretation and application? “After controlling for…” or “Holding all else constant” Correlational A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables. A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant. A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other variables. Quasi-Experimental Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6 years when compared to non-participants. Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants. There may be other variables that were unaccounted for, so we have to say that the observed coefficient is based on controlling for these certain variables. Interpretation and Application What are some key phrases to use and understand in the interpretation and application? “When compared to non-participants” Correlational A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables. A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant. A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other variables. Quasi-Experimental Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6 years when compared to non-participants. Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants. Just like with “holding all else constant”, we have to specify that these results are only based on our comparison to who we determine to be participants and non-participant matches. Interpretation and Application What’s the difference between statistical significance and practical significance? Statistical Significance Basically just a p-value of less than 0.05 There is a relationship between the independent variable (𝑋) and the dependent variable (𝑦) Practical Significance This varies by variable, it looks at the magnitude of β Most magnitudes in social sciences and behavioral studies will be between 0-5% Some magnitudes are not feasible Example A 3 percentage point increase in overall graduation rates is a major difference Moving from 78% graduating within 4 years to 81% graduating within 4 years Interpretation and Application What’s the difference between statistical significance and practical significance? Statistical Significance Basically just a p-value of less than 0.05 There is a relationship between the independent variable (𝑋) and the dependent variable (𝑦) Practical Significance This varies by variable, it looks at the magnitude of β Most magnitudes in social sciences and behavioral studies will be between 0-5% Some magnitudes are not feasible Example A 3 percentage point increase in overall graduation rates is a major difference A 3 percentage point increase a student’s probability of graduation is small A person with a 91% probability of graduating within 4 years to 94% probability is small Interpretation and Application What’s the difference between statistical significance and practical significance? Statistical Significance Basically just a p-value of less than 0.05 There is a relationship between the independent variable (𝑋) and the dependent variable (𝑦) Practical Significance This varies by variable, it looks at the magnitude of β Most magnitudes in social sciences and behavioral studies will be between 0-5% Some magnitudes are not feasible Example A 3 percentage point increase in overall graduation rates is a major difference A 3 percentage point increase a student’s probability of graduation is small A $10,000 increase in funding per FTE is associated with a 5% increase to public service participation Doesn’t make sense if current funding levels are $6,000 per FTE Is a 5% increase to public service participation worth the extra funding? Interpretation and Application What about translating into non-statistical speak? Be careful about not over-stating your findings Consider both the statistical and practical signficance Keep it simple and straightforward in the executive summary Add the details and statistical jargon in an appendix Interpretation and Application What about translating into non-statistical speak? Appendix 1a. Results of Propensity Score Matching UROP Study Abroad COOP UROP Study Abroad COOP GPA 0.16*** 0.14*** 0.12*** 4-Year Grad 0.11*** 0.07*** -0.24*** Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2939 3319 3091 Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2684 4239 4125 6-Year Grad Treated UROP 0.14*** 4045 Study Abroad 0.19*** 4725 COOP 0.17*** 4351 Note: Models for Internships, Minors, and the International Plan Control Matches 17,650 3684 16,970 4239 17,344 4125 fail measures of model fit. “The results of this study indicate that the Undergraduate Research Opportunities Program (UROP), study abroad, and Co-Op programs are all very successful in improving student outcomes when comparing participants to non-participants of a similar profile.” Interpretation and Application So we should push all of our students to participate in these programs? Appendix 1a. Results of Propensity Score Matching UROP Study Abroad COOP UROP Study Abroad COOP GPA 0.16*** 0.14*** 0.12*** 4-Year Grad 0.11*** 0.07*** -0.24*** Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2939 3319 3091 Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2684 4239 4125 6-Year Grad Treated UROP 0.14*** 4045 Study Abroad 0.19*** 4725 COOP 0.17*** 4351 Note: Models for Internships, Minors, and the International Plan Control Matches 17,650 3684 16,970 4239 17,344 4125 fail measures of model fit. Not necessarily. While these programs likely helped the students that participated, there are still a number of factors that could be preventing unbiased results. This only says that for participants, they likely did better than they would have had they not participated. Interpretation and Application So we should push all of our students to participate in these programs? Appendix 1a. Results of Propensity Score Matching UROP Study Abroad COOP UROP Study Abroad COOP GPA 0.16*** 0.14*** 0.12*** 4-Year Grad 0.11*** 0.07*** -0.24*** Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2939 3319 3091 Treated 4045 4725 4351 Control 17,650 16,970 17,344 Matches 2684 4239 4125 6-Year Grad Treated UROP 0.14*** 4045 Study Abroad 0.19*** 4725 COOP 0.17*** 4351 Note: Models for Internships, Minors, and the International Plan Control Matches 17,650 3684 16,970 4239 17,344 4125 fail measures of model fit. Statistical analyses are about average effects, not the effect for any one individual. Most students benefitted (on average) while there may have been others that did not fare as well. Some students may not benefit from any or all of these programs. Interpretation and Application Thoughts on when to use a table versus a graph? Most people don’t know how to interpret regression coefficients But for those that do, it provides all the statistical details Graphs can easily display a lot of information Graphs are very helpful when looking at trends over time Interpretation and Application When to use a table versus a graph. Table – when coefficients and statistical significance is most important Line Graph – when looking at continuous changes over time Bar Graph – when looking at categorical data Pie Graph – when looking at proportions of a whole Scatterplot – when graphing raw data and the best-fit line Interactive Graph – using special software that allows animations Interpretation and Application RAW DATA Year In-State Tuition 2000 $10,500 2001 $11,550 2002 $12,359 2003 $13,347 2004 $14,548 2005 $14,694 2006 $15,576 2007 $16,822 2008 $17,999 2009 $19,259 2010 $19,644 2011 $20,430 2012 $21,043 2013 $21,043 2014 $21,253 2015 $22,741 SUMMARY OUTPUT Regression Statistics Multiple R 0.990842 R Square 0.981769 Adjusted R Square 0.980466 Standard Error 545.8353 Observations 16 In-State Tuition is increasing by roughly $800 per year. ANOVA df Regression Residual Total Intercept In-State Tuition 1 14 15 SS 2.25E+08 4171107 2.29E+08 Coefficients -1614630 812.7924 Standard Error 59426.33 29.60208 MS 2.25E+08 297936.2 t Stat -27.1703 27.45728 F Significance F 753.902 1.41E-13 P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 1.63E-13 -1742087 -1487173 -1742087 -1487173 1.41E-13 749.3023 876.2825 749.3023 876.2825 Interpretation and Application RAW DATA Year In-State Tuition 2000 $10,500 2001 $11,550 2002 $12,359 2003 $13,347 2004 $14,548 2005 $14,694 2006 $15,576 2007 $16,822 2008 $17,999 2009 $19,259 2010 $19,644 2011 $20,430 2012 $21,043 2013 $21,043 2014 $21,253 2015 $22,741 In-State Tuition $25,000 $20,000 y = 812.79x + 10142 $15,000 $10,000 $5,000 In-State Tuition is increasing by roughly $800 per year. $0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Enrollment Interpretation and Application Enrollment 25000 25000 20000 20000 15000 15000 10000 10000 5000 5000 0 0 2012 Asian 2013 Black Hispanic Other Year 2012 2013 2014 Asian 2014 White Black 2012 Total Asian 6570 6688 7466 Hispanic Black 1273 1289 1386 Hispanic 1341 1428 1533 Other 743 779 974 White 11630 11287 11750 Total 21557 21471 23109 These are technically the correct ways to display this information. 2013 Other 2014 White Total Enrollment Interpretation and Application 25000 20000 15000 10000 5000 0 2012 2013 White Year 2012 2013 2014 Asian 6570 6688 7466 Black 1273 1289 1386 Asian Hispanic Hispanic 1341 1428 1533 2014 Black Other Other 743 779 974 White 11630 11287 11750 Total 21557 21471 23109 But if you want to fudge the rules on continuous data a bit, you could display it like this. http://enrollment.irp.gatech.edu/ Interpretation and Application RAW DATA Instruction Research Public Service Academic Support Institutional Support Student Services Other Core Expenditures Expenditures per FTE $13,939 $31,454 $2,589 $2,632 $3,300 $1,717 $604 Expenditures per FTE 1% 3% 6% 5% 25% Instruction 4% Research Public Service Academic Support Institutional Support Student Services Other Core Expenditures 56% Takeaways We’ve Learned… How to develop a good research question How bias can arise in simple comparisons Self-Selection Population versus Sample Spurious Relationships & Omitted Variables Simultaneity & Reverse Causality History We’ve Learned… How to develop a good research question How bias can arise in simple comparisons About experimental designs Causality Random Assignment Treatment & Control Groups Pre-Test & Post-Test We’ve Learned… How to develop a good research question How bias can arise in simple comparisons About experimental designs About quasi-experimental designs Regression Multivariate Regression Logistic Regression Propensity Score Matching Time Series & Longitudinal Analyses We’ve Learned… How to develop a good research question How bias can arise in simple comparisons About experimental designs About quasi-experimental designs About research ethics About mathematical models Data Types Confidence Intervals t-tests Designs in Excel Designs in SAS Examples Exercises We’ve Learned… How to develop a good research question About mathematical models How bias can arise in simple comparisons About interpretation and application About experimental designs About quasi-experimental designs About research ethics Coefficients Statistical & Practical Significance Graphs & Tables Takeaways What was the most important takeaway for you? What was most helpful? What needed additional attention? What questions have not been answered? Additional Resources Coelli, T. J., Rao, D. S. P., O’Donnell, C. J., & Battese, G. E. (2005). An introduction of efficiency and productivity analysis (2nd ed.). New York, NY: Springer. Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects: Using experimental and observational designs. Washington, DC: American Educational Research Association. Shaddish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference (2nd ed.). Boston, MA: Houghton Mifflin. Wooldridge, J. M. (2009). Introductory econometrics: A modern approach (4th ed.). Mason, OH: South-Western Cengage Learning. This one is a short handbook that would be a good, quick reference. Correlation, Causation, & Evaluation A PRACTITIONER’S GUIDE TO RESEARCH METHODS JUSTIN C. SHEPHERD, PH.D. JUSTIN.SHEPHERD@IRP.GATECH.EDU