Technical Efficiency - Georgia Tech | Institutional Research and

advertisement
Correlation, Causation, & Evaluation
A PRACTITIONER’S GUIDE TO RESEARCH METHODS
JUSTIN C. SHEPHERD, PH.D.
Why it’s a good idea you signed up for
this workshop…
 Big data and analytics are becoming increasingly important
 Data driven decision making
 Eliminate hunches or guesstimates
 Improper methods can lead to misleading results
 No data is better than bad data
 Turn data into results!
 Provide accurate, meaningful findings and interpret appropriately to provide context and
understanding
Introductions
 Name
 Affiliation
 Title / Responsibilities
 Statistical Expertise
 What you’re hoping to learn
Outline
Outline
 Developing Research Questions (1 hour)
 Strategizing & Game Planning (2 hours)
 Math from a Fire Hose (2.5 hours)
 Interpretation & Application (1.5 hours)
* Short breaks every 45 minutes to 1 hour
Developing Research Questions
 What are the issues with this statement?




Example #1
Magnitude of an Effect Size
Think about the denominator
Standard Error
Statistical Significance
 What are some similar examples we’ve encountered?
 150% increase in enrollment for Pacific Islanders.
 Went from 2 students to 5 but only account for <1% of the total student
population.
 50 point average increase in SAT scores after enrollment in SAT
prep course.
 For a course with 3 people in it.
 1 person increased by 200 points. 1 person stayed the same. 1 person decreased
by 50 points. 33% success rate? 33% failure rate? Helpful?
 What are the issues with this statement?
 Politics ≠ Statistical Results . . . And that’s okay . . . sometimes
 What are some similar examples we’ve encountered?
 Save $1 million / year by keeping computers 1 year longer
Example #2
 Slow operating system. Lower morale. Dated software and support.
 Average high school GPA could increase if we only accepted
international students.
 Access. Equity. Taxpayer accountability.
 What are the issues with this statement?




Example #3
Data  Decision NOT Decision  Data
Think of your population
Think of your sample
Think of the limitations
 What are some similar examples we’ve encountered?
 A sample of remedial students have lower GPAs than traditional
students.
 Participating in Greek affairs increases participation in student
activities.
 Students who went to the gym before 8am were more likely to
retain.
 What are the issues with this statement?
 Likert Scales
 “Not Applicable”, “No Opinion”, “Undecided” or “Neutral”
 What are some similar examples we’ve encountered?
Example #4
 Who reads Inside Higher Ed? Who comments on Inside Higher
Ed? How might these people be different than the population?
Good questions beget good answers.
 These examples are widespread!
 Learn how to think for yourself.
 Learn how to draw your own conclusions from raw data, not fancy graphics.
 Learn how to protect your work from naysayers.
Developing Research Questions
 What are some common questions?
 Where do they arise from?
 What makes a “good” question versus a “bad” question?
Developing Research Questions
COMMON QUESTIONS
 Example 1
SOURCE / FROM
 Example 1
Things to consider when developing a question
What would
happen if we
admitted more
Freshmen?
Things to consider when developing a question
Admissions
Housing
Infrastructure
What would
happen if we
admitted
more
Freshmen?
Well
Being
Safety
Academic
Support
Parking
Things to consider when developing a question
Selectivi
ty
Transfer
Cohort
Yield
Admissi
ons
OnCampus
Hours of
Operatio
n
High
School
GPA
SAT
Scores
Room &
Board
Costs
Infrastruc
ture
Maintena
nce
Space
Well Being
OffCampus
RA’s
What would
happen if we
admitted
more
Freshmen?
Counseling
Housing
Parking
Spaces
Parking
Gym
Facilities
Ticketing
Campus
Police
Academic
Support
Tutoring
Safety
Emergen
cy
Manage
ment
Escorts /
Rides
Advising
Things to consider when developing a question
 Even relatively simple questions can turn into complex analyses very quickly.
 Is the question specific?
 What are the units of analysis?
 How were they selected?
 How long is the observation time?
 What are our resources (time & money)?
Developing Research Questions
 Was our program successful?
 What program? What’s the definition of success?
 Specificity
 Did our first-year orientation program result in an increase to retention rates?
Developing Research Questions
 Did our first-year orientation program result in an increase to retention rates?
 For whom?
 Treatment versus Control
 Participants in the first-year orientation program as compared to non-participants
 Did our first-year orientation program result in an increase to retention rates for participants as compared to non-participants?
Developing Research Questions
 Did our first-year orientation program result in an increase to retention rates for participants as
compared to non-participants?
 What if it’s required of everyone? There is no treatment or control.
 How do we know it’s not a coincidence?
 Before and After
 Do we have enough historical data to make a claim?
 Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants as
compared to non-participants?
Developing Research Questions
 Did the introduction of our first-year orientation program in AY 2010 result in an increase to
retention rates for participants as compared to non-participants?
 Was it voluntary? Why did they participate?
 What would have happened if they hadn’t participated?
 Random Assignment
 This is where we often fail. Most students self-select to participate in a program. There is no random assignment.
 Without random assignment, can’t compare participants to non-participants.
 Did the introduction of our first-year orientation program in AY 2010 result in an increase to retention rates for participants who
otherwise would not have participated?
Developing Research Questions
 Did the introduction of our first-year orientation program in AY 2010 result in an increase to
retention rates for participants as compared to non-participants?
 Was it worth it? How much did it cost?
 Cost-Benefit Analyses
 In higher education, this is frequently subjective. It’s hard to measure quality.
 Was the identified 1.5% increase to retention rates for participants of the first-year orientation program worth the $3 million
investment?
Developing Research Questions
 Break into groups and discuss real examples of research questions.




What were some good components of the questions?
Were there areas of miscommunication / misunderstanding?
What was the result?
How could they have been improved?
 Use the examples provided or create your own.
 Write these down. This is important, we’ll be using these throughout the day.
 Discussion.
Developing Research Questions
 Why can’t you just compare people that participated to people that didn’t?
 Selection Bias
 Self-Selection
 The people that chose to participate are different than those who didn’t participate.
 Student government leaders are inherently different from the rest of the class.
 Student government may not have improved their abilities, those with high abilities chose to run for student government.
Developing Research Questions
 Why can’t you just compare people that participated to people that didn’t?
 Selection Bias
 Forced Selection
 The people that were forced to participate are different than those who didn’t participate.
 Those who are required to seek academic tutoring are different than those who are not required.
 Tutoring may have helped, but it may not look like much since they struggled in the first place.
Developing Research Questions
 Why can’t you just compare people that participated to people that didn’t?
 Selection Bias
 Spurious Relationships
 Unobserved student characteristics matter.
 Student government leaders have a higher motivation and stronger social ties.
 Those needing tutoring may be struggling because of financial need, employment, or family obligations.
 Those needing tutoring may also be receiving private tutoring or outside academic counseling.
Developing Research Questions
 Why can’t you just compare people that participated to people that didn’t?
 Simultaneity / Reverse Causality
 Both the treatment and outcomes are changing at the same time.
 As crimes on campus increase, the size of the campus police force is increased.
 But a larger police force doesn’t equate to more crime, even though there is a positive correlation.
Positive Relationship
Campus
Crime
Campus
Police
Developing Research Questions
 Why can’t you just compare people that participated to people that didn’t?
 Simultaneity / Reverse Causality
 Both the treatment and outcomes are changing at the same time.
 As crimes on campus increase, the size of the campus police force is increased.
 But a larger police force doesn’t equate to more crime, even those there is a positive correlation.
Positive Relationship
Campus
Crime
Campus
Police
Negative Relationship
Developing Research Questions
 Why can’t you just compare people that participated to people that didn’t?
 History
 Environmental changes may cause a temporary or permanent shift.
Introduction of the MS in Computer
Science MOOC.
Developing Research Questions
 Why can’t you just compare people that participated to people that didn’t?
 Selection Bias
 Self-Selection
 Spurious Relationships
 Simultaneity / Reverse Causality
 History
 Without a traditional experiment, you have to use math to help you identify treatment effects.
UH-OH
Developing Research Questions
 Knowing what we now know, if you could propose one research question to the president at
your institution, what would it be?
 Write these down. This is important, we’ll be using these throughout the day.
Break
Strategizing and Game Planning
Strategizing and Game Planning
X
X
X
X
X
X
O
O
O
X
O
X
X
O
O
O
O
Scatterplot?
X
O
O
X
O
Strategizing and Game Planning
X
X
X
X
X
X
O
O
O
X
O
X
X
O
O
O
O
Regression?
X
O
O
X
O
Strategizing and Game Planning
X
X
X
X
X
X
O
O
O
X
O
O
O
X
X
O
X
O
O
O
Gameplan.
Time to get into the X’s and O’s of Research Design.
X
O
Experimental Designs
So what is a traditional experiment?
R O X O Treatment group
R O
O Control group
R is random assignment
X is the treatment
Experiments are the gold standard against which all other research is evaluated.
Experimental Designs
Why are traditional experiments important? Especially since we rarely do them?
 Experiments form the mathematical foundation of causation




Control group and treatment group are the same
Only difference is presence of treatment
Difference in performance is attributable to only the treatment
Treatment causes the effect
Causality
What do you need in order to establish causality?
 Association
 Time Order
 Nonspuriousness
 Mechanism
 Context
Causality
What do you need in order to establish causality?
 Association




Correlation between X and Y
Increasing financial aid by $1000/yr increases the probability of graduation by 10%
Direction
Magnitude
Causality
What do you need in order to establish causality?
 Time Order
 Increasing financial aid by $1000/yr increases the probability of graduation by 10%
 Financial aid affects graduation
 Financial aid must come before graduation
Causality
What do you need in order to establish causality?
 Time Order




What’s wrong with the following statement?
“As retention rates increase by 1 percentage point, the average SAT score increases by 100 points.”
High correlation, but reverse causality. The time order is reversed.
Retention rates don’t cause SAT scores to increase. SAT scores might cause retention rates to increase.
Causality
What do you need in order to establish causality?
 Time Order




What’s wrong with the following statement?
“As high school GPA increases by 0.10, the average SAT score increases by 100 points.”
High correlation, but simultaneity. Both things are happening at the same time.
High school GPA doesn’t cause SAT scores to increase. SAT scores don’t cause high school GPA to
increase. They are both highly correlated measures of the same principle, aptitude.
Causality
What do you need in order to establish causality?
 Nonspuriousness
 No outside factors could cause the relationship
25
600
20
500
400
15
300
10
200
5
100
0
0
Ice Cream Sales
Shark Attacks
Surfers Beware! Sharks Scream for Ice Cream.
Number of Shark Attacks
Ice Cream Sales In Millions
700
Causality
What do you need in order to establish causality?
 Nonspuriousness
90
600
80
25
80
20
60
400
50
300
40
30
200
20
100
Shark Attacks
70
500
90
70
60
15
50
40
10
30
20
5
10
0
0
Ice Cream Sales
Temperature
10
0
0
Shark Attacks
Temperature
Temperature
700
Temperature
Ice Cream Sales In Millions
 No outside factors could cause the relationship
Causality
What do you need in order to establish causality?
 Mechanism
 How something happens.
 Temperature ↑  People Get Hot  People Want to Cool Down  Ice Cream Sales ↑
 Temperature ↑  People Get Hot  People Want to Cool Down  People Swim  Shark Attacks ↑
Causality
What do you need in order to establish causality?
 Context
 Context in which it happens.
 For whom? When? Under what conditions?
 Increasing financial aid by $1000/yr increases the probability of graduation by 10%
 Maybe it’s actually 12% for black students
 Maybe it’s only 6% for white students
 Maybe it’s only 3% at community colleges
Causality
What do you need in order to establish causality?
 Association
 Time Order
 Nonspuriousness
 Mechanism
 Context
Experimental Designs
Experimental designs help establish causality by isolating the effect of the treatment.
R O X O Treatment group
R O
O Control group
R is random assignment
X is the treatment
Experimental Designs
R O X O
R O
O
Association: X correlated to O.
The treatment is correlated to the post-test results. As the treatment changes, the post-test
results are likely to change.
Experimental Designs
R O X O
Time Order: X associated with post-test results.
R O
Otherwise, results wouldn’t change.
O
Without treatment, the pre-test and post-test results should be the same.
Differences in the treatment group between the pre-test and post-test are therefore because X,
the treatment, resulted in the change.
Experimental Designs
Non-spuriousness: Ensures equivalent groups.
R O X O
R O
O
Random assignment makes the assumption that groups are equivalent because everyone has the
same probability of receiving either the treatment or the control and therefore the groups are
random and, if large enough, not likely to differ.
Equivalent groups means there is nothing else that could have caused the difference between the
treatment and control groups.
Experimental Designs
R O X O
R O
O
Gives a baseline.
Pre-tests help to show that the groups are equivalent by establishing a baseline. If there are large
differences in the pre-test, the assignment may have failed.
Experimental Designs
R O X O
R O
O
Gives a baseline.
Pre-tests help to show that the groups are equivalent by establishing a baseline. If there are large
differences in the pre-test, the assignment may have failed.
Pre-tests also help to show the difference made by the treatment by comparing measures before
and after the treatment was administered.
Experimental Designs
R O X O
R O
O
Take the difference in the results.
Because the groups are randomly assigned, and the only difference between the groups is
whether they were treated or not, the treatment effect is simply the difference between those
who were treated and those who were not.
Experimental Designs
But there can be all sorts of different
experimental designs:
R O O O O X O O O O
R O O O O
O O O O
R O X O O O O O O O
R O
O O O O O O O
Multiple observations over time help to
account for environmental changes.
Multiple observations after the treatment look
for a diminishing or lagged effect.
Experimental Designs
But there can be all sorts of different
experimental designs:
R O X O X O X O
R O
O
R O X O
R O
O
R
X O
R
O
O
O
Multiple treatments help to identify patterns.
Eliminating a pre-test for select groups ensures
that the pre-test did not influence the posttest results.
Experimental Designs
But there can be all sorts of different
experimental designs:
R O X1 O
R O X2 O
R O X3 O
R O
O
Different treatments can help isolate which
treatment is best.
Experimental Designs
No matter which design, they all have key common elements:
1. Treatment and Control Group
2. Before and After Observations
3. Random Assignment
Without these elements, the fundamental assumptions of experimental research break down
and we must use math to isolate the effect.
Experimental Designs
Break into teams.
Using your questions raised earlier, figure out a way how you might be able to design an
experiment.




Which type of design did you use?
Why?
How would you assign / select the participants?
What are the potential pitfalls?
Experimental Designs
Example & Discussion.
Read the handout about the FAFSA Experiment.






What is/are the treatment(s)?
How were people assigned?
Which design was used?
What does this tell us about the application process for financial aid?
How might you change the experiment?
What else could be done to expand upon these findings?
Break
Did the introduction of our first-year orientation program in AY
2010 result in an increase to retention rates for participants as
compared to non-participants?
O X O Treatment group
O
O Control group
No random assignment. Students choose to participate.
OR
O X O Treatment population
No random assignment or control group.
Did the introduction of our first-year orientation program in AY
2010 result in an increase to retention rates for participants as
compared to non-participants?
X O Treatment group
O Control group
No random assignment. No pre-test. Students choose to
participate.
OR
X O Treatment population
No random assignment, pre-test, or control
group.
Did the introduction of our first-year orientation program in AY
2010 result in an increase to retention rates for participants as
compared to non-participants?
 Why can’t we just design an experiment?
Research Ethics
 Pitfalls of Experimental Designs




Informed Consent
Choice
Do No Harm
Human Subjects
No experiment is worth the results if people are manipulated against their will, have their privacy
invaded, or are harmed emotionally, psychologically, physically, or otherwise.
Research Ethics
Example & Discussion.
Read the handouts about the Stanford Prison Experiment and the Facebook Experiment.
 What are the ethical issues involved with each experiment?
 If you were part of IRB, would you approve the Stanford Prison Experiment?
 If you were part of IRB, would you approve the Facebook Experiment?
Research Ethics
 Pitfalls of Research Designs






Data Fabrication
Results Manipulation
Program/Treatment Falsification
Conflict of Interest
Personal Gain
Questions drive research. Answers do not.
Research Ethics
 University of Missouri at Kansas City (February 2015)
 “The University of Missouri at Kansas City gave the Princeton Review false information designed to inflate the rankings of its business school, which was
under pressure from its major donor to keep the ratings up…” Inside Higher Ed
 Georgia Institute of Technology (January 2015)
 “…a former tenured professor of electrical engineering at Georgia Institute of Technology, has been indicted on two counts of racketeering, based on
allegations that he poured some $1 million of university funds into his own tech company…” Inside Higher Ed
 University of North Carolina (October 2014)
 “Over nearly two decades, professors, coaches, and administrators either participated in the scheme or overlooked it, undercutting the core values of one of
the nation’s premier public universities.” Chronicle of Higher Education
Tulane University (January 2013)
 “U.S. News & World Report has moved Tulane University’s business school to the “unranked” section of its business-school listings after the school’s recent
admission that it had inflated test scores and the number of completed applications to its full-time M.B.A. program for several years.” Chronicle of Higher
Education
Research Ethics
George Washington University (November 2012)
 “George Washington officials said they later discovered that the admissions office had been estimating the class rank for high-performing students whom
they “assumed” were in the top 10 percent of their classes, based on their grade-point averages and standardized-test scores.” Chronicle of Higher Education
 Emory University (August 2012)
 “Emory University intentionally misreported its admissions data for more than a decade, with the knowledge and participation of the leadership of the
admission and institutional-research offices” Chronicle of Higher Education
 Claremont McKenna College (January 2012)
 “A senior administrator at Claremont McKenna College has resigned after admitting to falsely reporting SAT statistics since 2005…” Chronicle of Higher
Education
 American Psychological Association Ethics Code
 “authors should not submit manuscripts that have been published elsewhere in substantially similar form or with substantially similar content.”
Revised Garbage Can
Kingdon, 1995
Problems
Solutions
Policymaker
Politics
Decision
Research Ethics
We are researchers, not policymakers.
Questions drive research. Answers do not.
Problems
Politics
Research
Solutions
Research Ethics
Research should be objective and unbiased.
Let the research generate solutions and leave the politics of the decision making to the
policymakers.
Problems
Politics
Research
Solutions
Research Ethics
 Pitfalls of Research Designs






Data Fabrication
Results Manipulation
Program/Treatment Falsification
Conflict of Interest
Personal Gain
Questions drive research. Answers do not
Do not let your research integrity be jeopardized by people, politics, or circumstances.
It’s unethical.
It’s illegal.
It invalidates everything you’ve ever done or will ever do.
Did the introduction of our first-year orientation program in AY
2010 result in an increase to retention rates for participants as
compared to non-participants?
 So back to our original question: Why can’t we just design an experiment?
 Choice
 Students should be allowed to decide if they want to participate
O X O
O
O
 Withholding a potential benefit
 Don’t harm non-participants if you know it’s a beneficial program
O X O
Since we don’t have an experiment, what can we do?
 The lack of an experiment threatens causality.
 Now the groups differ on either observable or unobservable characteristics.
 Differences between groups means that the result may have been to differences in populations and not
due to the treatment.
Quasi-Experimental Designs
 Compare participants to non-participants?
 No! The reason why they participated often directly impacts the outcome.
 High achieving students are more likely to participate in the first-year orientation program, making it
look better than it actually is.
Participants

Non-Participants

“Participants performed better than non-participants…
…but they would have anyhow!
The program actually didn’t do anything.”
Quasi-Experimental Designs
 Compare participants to non-participants?
 No! The reason why they participated often directly impacts the outcome.
 Low achieving students are more likely to participate in remedial education, making it look worse than it
actually is.
Policy: “Students with less than a 500 on their SAT Math are required to enroll in MATH 0150.”
GPA of MATH 0150 Students = 2.0
GPA of Non- Math 0150 Students = 3.0
Non-Analyst: “MATH 0150 is associated with a lower GPA. We should eliminate MATH 0150.”
Analyst Response: “Had these students not enrolled in MATH 0150, their GPA would have been 1.5.
MATH 0150 actually improves GPA by 0.5!”
Quasi-Experimental Designs
 Compare participants to non-participants?
 No! The reason why they participated often directly impacts the outcome.
 High achieving students are more likely to participate in the first-year orientation program, making it
look better than it actually is.
 Low achieving students are more likely to participate in remedial education, making it look worse than it
actually is.
 Need to identify the counterfactual – what would have happened had the program never been
introduced.
 Yet the simple comparison between participants and non-participants is what we oftentimes
present.
Quasi-Experimental Designs
In essence, because the groups differ, we’re not interested in comparing
Participants
to
Non-Participants


Instead, we want to try to compare the results of participants to what would have happened
had they not received treatment.
Participants Receiving Treatment to
Alternate Reality without Treatment


As you sci-fi fans can imagine, this might be a bit difficult given our current understanding of
physics.
Quasi-Experimental Designs
Since, we can’t compare alternate realities, we have to try to develop a control group that
resembles the treatment group as closely as possible.
Participants
to
Non-Participants


But no matter how closely we try to match the groups, without random assignment, there will
always be slight differences between the groups. There is no perfect match. Even if all the
measured variables match, there will always be unobserved characteristics that cannot be
captured.
The groups look equivalent…
Treatment Group
Control Group
Age
18.6
18.7
Female
0.61
0.60
Asian
0.11
0.12
Black
0.13
0.12
Hispanic
0.13
0.13
White
0.60
0.61
HS GPA
3.86
3.84
SAT Verbal
604
606
SAT Math
711
709
Then why might the treatment group have decided to participate in orientation while the
control group did not?
Then why might the treatment group have decided to
participate in orientation while the control group did not?
TREATMENT GROUP
CONTROL GROUP
 Extrovert
 Introvert
 Social
 Isolated
 School Pride / Spirit
 Career Oriented
 Extrinsically Motivated
 Intrinsically Motivated
None of these are being measured! The groups will differ.
Quasi-Experimental Designs
 Compare participants to non-participants after controlling for their characteristics?
 It’s a start. Use regressions to control for student characteristics.
 Students who participated in the first-year orientation program were 5% more likely to be retained for
their second year after controlling for academic preparation.
 But we can’t measure everything.
 Multivariate and Logistic Regression Models
Quasi-Experimental Designs
RETENTION = ƒ (treatment, age, gender, race/ethnicity, … )
OR
RETENTION = β0 + β1 treatment + β2 age + β3 gender + β4 race/ethnicity + …
OR
RETENTION = 0.0013 + 0.05 treatment – 0.01 age + 0.03 gender + 0.02 race/ethnicity + …
This is the coefficient of interest.
Quasi-Experimental Designs
 Match participants to non-participants based on their characteristics?
 Better! Match participants to non-participants based on their characteristics and then compare the
results.
 Students who participated in the first-year orientation program were 2.8% more likely to be retained for
their second year when compared to matched non-participants.
 Yet again, still can’t measure everything.
 Not everyone has a match.
 Propensity Score Matching
Quasi-Experimental Designs
TREATMENT GROUP
CONTROL GROUP












Quasi-Experimental Designs
TREATMENT GROUP
CONTROL GROUP
TREATMENT EFFECT
 = 82
 = 79
3
 = 86
 = 82
4
 = 85
 = 81
4
 = 82
 = 80
2
 = 81
 = 79
2
 = 80
 = 78
2
2.83
Quasi-Experimental Designs
 Look at changes in retention rates over time?
 Sure, but cautiously.
 Look at the trajectory before the first-year orientation was introduced. Then look at the trajectory after
the first-year orientation was introduced. Look for a jump in retention rates.
 But there’s a lot out there that could have also happened during this time to explain any jumps, so you’ll
need to take some additional steps to isolate the effect of the orientation program.
 Longitudinal and Time-Series Analyses
 Fixed Effects
 Difference-in-Differences
Since the introduction of the program in 2010, retention rates
have risen…
Retention Rates
86
84
82
80
78
76
74
X O O O O O
72
Program Success! ?
70
2006
2007
2008
2009
2010
2011
2012
2013
2014
But at a lower rate than they were before the program was
introduced.
Retention Rates
86
84
82
80
78
76
74
O O O O X O O O O O
72
Program Failure!
70
2006
2007
2008
2009
2010
2011
2012
2013
2014
An example with treatment and control groups.
Retention Rates
84
83
82
81
O O O O X O O O O O
O O O O
O O O O O
Treatment
Effect
80
79
78
77
76
75
74
2006
2007
2008
2009
2010
Treatment Group
2011
Control Group
2012
2013
2014
Quasi-Experimental Designs
The most important part of quasi-experimental designs is the DESIGN.
 If there is no design, you’ve limited yourself to post hoc analysis.
 Spend time designing the implementation.
 Treatment and control groups
 Pre-test and post-test
 Timing
 Scale
 Pilot programs
Quasi-Experimental Designs
The better the design, the more tools at your disposal.
 Program Evaluation
 Reliability – Are the methods of measurement consistent?
 Validity – Are the results measured correctly?
 Implementation Fidelity – Is the program being run according to plan?
 Treatment Effect – Is the treatment having the intended effect?
 Cost-Benefit – Are the results worth the cost?
Pop Quiz
QUESTION
METHOD
Does participation in the undergraduate
research program lead to higher GPA’s?
Group Comparison
Regression
Matching
Time-Series Analysis
None of the Above
Why?
What’s the underlying issue?
Pop Quiz
QUESTION
METHOD
Are Hispanic students more likely to major in
Industrial Engineering?
Group Comparison
Regression
Matching
Time-Series Analysis
None of the Above
Why?
What’s the underlying issue?
Pop Quiz
QUESTION
METHOD
Did the implementation of the mandatory
requirement to enroll in a Freshman seminar
improve retention rates?
Group Comparison
Regression
Matching
Time-Series Analysis
Why?
What’s the underlying issue?
None of the Above
Pop Quiz
QUESTION
METHOD
Are West Point students who attend the ArmyNavy game more likely to be retained?
Group Comparison
Regression
Matching
Time-Series Analysis
None of the Above
Why?
What’s the underlying issue?
Pop Quiz
QUESTION
METHOD
Do athletes who receive tutoring have higher
GPA’s than those who do not?
Group Comparison
Regression
Matching
Time-Series Analysis
None of the Above
Why?
What’s the underlying issue?
Pop Quiz
QUESTION
METHOD
Do faculty in California earn more than faculty
in New York?
Group Comparison
Regression
Matching
Time-Series Analysis
None of the Above
Why?
What’s the underlying issue?
Pop Quiz
QUESTION
METHOD
Does participation in Greek life lead to a
higher probability of graduation within 6
years?
Group Comparison
Regression
Matching
Time-Series Analysis
None of the Above
Why?
What’s the underlying issue?
Quasi-Experimental Designs
Break into teams.
Using your questions raised earlier, figure out a way how you might be able to address the
experimental shortcomings.
 How would you design your program?
 Treatment. Control. Pilot. Timing. Etc.
 Which type of quasi-experimental design would you use?
 Group comparison. Regression. Matching. Time Series. Etc.
 Why?
Break
Math from a Fire Hose
P L E A S E D O W N LO A D S H EPH ER D_ ECONOMET R I C S_DATA S ETS. X LSX F O R E X A M P L E D ATA
P L E A S E D O W N LO A D S H EPH ER D_ ECONOMET R I C S_E X ERC I S ES. X LSX F O R E X E R C I S E D ATA
Math from a Fire Hose
 You’ll need to install the following Add-In to Microsoft Excel if you’re not planning on using SAS.
 File  Options  Add-Ins  Analysis ToolPak  Go  Check Analysis TookPak  OK
Math from a Fire Hose
 Data
Math from a Fire Hose
 We‘ve covered:
 Developing Research Questions
 What makes a good question.
 How the question drives the research.
 Why clear, specific questions are important.
Math from a Fire Hose
 We‘ve covered:
 Developing Research Questions
 Sources of Bias
 The issues in trying to address questions without considering statistics.
Math from a Fire Hose
 We‘ve covered:
 Developing Research Questions
 Sources of Bias
 Causality
 Why the sources of bias are so important.
 Why correlation does not equal causality.
 Why causality is the goal.
Math from a Fire Hose
 We‘ve covered:




Developing Research Questions
Sources of Bias
Causality
Experimental Designs
 Why experiments are the gold standard to determine causality.
 Why it’s so hard to design and establish.
Math from a Fire Hose
 We‘ve covered:





Developing Research Questions
Sources of Bias
Causality
Experimental Designs
Research Ethics
 Where research goes wrong.
Math from a Fire Hose
 We‘ve covered:






Developing Research Questions
Sources of Bias
Causality
Experimental Designs
Research Ethics
Quasi-Experimental Designs
 Methods to address research questions when there is no experiment.
 Methods to address research questions when there is no design at all (post hoc).
 Methods to evaluate programs.
Math from a Fire Hose
 We‘ve covered:






Developing Research Questions
Sources of Bias
Causality
Experimental Designs
Research Ethics
Quasi-Experimental Designs
Now it’s time to address how each of these work in practice.
Math from a Fire Hose
 Data Types
 Confidence Intervals
 t-tests
 Regression
 Multivariate Regression
 Logistic Regression
 Propensity Score Matching
 Time-Series Analysis
Data Types
 Character / String
 Qualitative
 Feelings, emotions, narrative, politics, “how”, and other aspects that cannot be captured in numbers.
 Jackie feels very motivated when she gets to study a subject she enjoys.
 Decisions are typically made by forming committees where the chair directs discussion.
Data Types
 Character / String
 Qualitative
 Categorical
 A non-numeric category which we will then try to turn into a numeric value.
 Class
 Freshman. Sophomore. Junior. Senior.
 Freshman = 1
Sophomore = 2
Junior = 3
Senior = 4
Data Types
 Character / String
 Qualitative
 Categorical
 Numeric
 Binary
 Yes / No answers coded as 1 / 0
 Gender
 Male. Female.
 MALE  Male = 1
 Class
Female = 0
 Freshman. Sophomore. Junior. Senior.
 FRESHMAN  Freshman = 1
 SOPHOMORE  Freshman = 0
Sophomore = 0
Sophomore = 1
Junior = 0
Junior = 0
Senior = 0
Senior = 0
Data Types
 Character / String
 Qualitative
 Categorical
 Numeric
 Binary
 Discrete
 Integers
 Counts
 Enrollment = 22,491
 Cannot have an enrollment count of 22,490.56
Data Types
 Character / String
 Qualitative
 Categorical
 Numeric
 Binary
 Discrete
 Continuous
 Fractions and Decimals
 GPA
 GPA = 3.86525
 Often rounded to 2 decimal places (3.87)
 Money
 4.125% return on $1,256,547 = $1,308,379.56375
 Often rounded to the dollar or penny ($1,308,379.56)
Data Types
 Character / String
 Qualitative
 Categorical
 Numeric




Binary
Discrete
Continuous
NOIR
Data Types
 Nominal
 Data order has no meaning.
 Race/Ethnicity
1 = White 2 = Black 3 = Hispanic 4 = Asian
Could just have easily been
1 = Asian 2 = Black 3 = Hispanic 4 = Other
5 = Other
5 = White
Data Types
 Nominal
 Ordinal
 Data order matters, but has a subjective scale
 Likert Scale
1 = Strongly Disagree 2 = Disagree 3 = Neutral 4 = Agree 5 = Strongly Agree
How much is the difference between Strongly Disagree and Disagree?
The same as the distance between Agree and Strongly Agree?
The same as the distance between Disagree and Neutral?
Data Types
 Nominal
 Ordinal
 Interval
 Data order has meaning and equal units, but no ratio scale
 Temperature
90° to 91° equals 1°
45° to 46° equals 1°
But 90° is not twice as hot as 45°
Data Types
 Nominal
 Ordinal
 Interval
 Ratio
 Data order has meaning, units are equal, and ratio scales are true
 Money
$1,000 to $1,001 is $1
$500 to $501 is $1
$1,000 is twice as much money as $500
Confidence Intervals
0.45
0.4
0.35
0.3
0.25
0.2
Mean = 0
S. D. = 1
0.15
0.1
0.05
0
-4
-3
-2
-1
0
1
Normal Distribution
2
3
4
Confidence Intervals
0.45
0.4
0.35
0.3
0.25
0.2
0.15
Area = 1
0.1
0.05
0
-4
-3
-2
-1
0
1
2
3
4
90% Confidence Interval
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
Area = 0.05
Area = 0.05
0.05
0
-4
-3
-2
-1
0
1
2
3
90% CI is the same at α = 0.10 (the sum of the two areas)
4
95% Confidence Interval
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
Area = 0.025
Area = 0.025
0.05
0
-4
-3
-2
-1
0
1
2
3
4
90% CI is the same at α = 0.05 (this is the standard level of statistical significance)
95% Confidence Interval
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-4
-3
-2
-1
0
1
2
Results out here are not due to chance.
3
4
Confidence Intervals
 5% significance level means with 95% certainty that the results are not due to chance.
 α = 0.05
 p < 0.05 means the results are significant (not due to chance)
This is also affected by the sample size
 The greater the sample size, the greater the likelihood that statistical significance can arise
 Type I Error – the results are significant when they actually should not be
 Type II Error – the results are not significant but they actually are
 Would rather find no relationship when one actually exsists instead of finding a relationship
that isn’t actually there
 Use at least 5% significance level, if not greater
 The higher the significance level, the greater the certainty that the results are not due to chance.
 p < 0.001 is highly significant
Confidence Intervals
 5% significance level means with 95% certainty that the results are not due to chance.
 α = 0.05
 p < 0.05 means the results are significant (not due to chance)
0.45
0.4
0.35
0.3
0.25
0.2
0.15
2-tail p ≈ 0.32
0.1
0.05
0
-4
-3
-2
-1
0
1
2
3
4
Results are not significant. No difference between means. Could be due to chance.
Confidence Intervals
 5% significance level means with 95% certainty that the results are not due to chance.
 α = 0.05
 p < 0.05 means the results are significant (not due to chance)
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
2-tail p ≈ 0.035
0.05
0
-4
-3
-2
-1
0
1
2
3
4
Results are significant. Means differ. Not likely due to chance.
Confidence Intervals
 5% significance level means with 95% certainty that the results are not due to chance.
 α = 0.05
 p < 0.05 means the results are not equal to 0
1
0.5
Coefficient and 95% CI contains 0.
Not significant.
Results could equal 0.
0
-0.5
-1
-1.5
-2
Coefficient and 95% CI do
not contain 0. Significant.
-2.5
-3
X1
X2
t-tests
 Group comparisons
 Looks for a difference in mean values, considering the variation of each group
t-score =
𝑋1 − 𝑋2
2
𝑆2
1 + 𝑆2
𝑛1 𝑛2
t-tests
 Group comparisons
 Looks for a difference in mean values, considering the variation of each group
t-score =
𝑚𝑒𝑎𝑛1 − 𝑚𝑒𝑎𝑛2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒1
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒2
+
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒1 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒2
t-tests
 Group comparisons
 Good to use when comparing samples of the same population
 Differences between a pre-test and a post-test
 Differences between independent samples
 Good to use when estimating regression coefficients
 Whether a coefficient is statistically significant
 Excel Example
t-tests
Do they have equal variances?
Not usually.
Do they have equal
sample sizes?
t-tests
p > 0.05
Not Significant
No difference in means
t-tests
t-tests
p < 0.05
Significant
9.83 difference in means
t-tests
t-tests
p > 0.05
Not Significant
t-tests
 Example 1: There was no statistically significant difference in the test scores.
 The treatment had no effect on test scores.
 Example 2: There was a statistically significant difference in the test scores.
 The treatment is associated with a 10 point increase in test scores, on average.
 Example 3: There was no statistically significant difference in the samples.
 Despite a 10 point difference in the means, there was no statistically significant difference due to the
differences in sample sizes and the large variation.
 The means could actually be the same. Fail to reject the notion that they are equal.
t-tests – Exercise
 Open the t-test tab in the Exercises workbook and determine if there are differences in the
means
 X1 and X2
 X2 and X3
 X1 and X3
 Is there a statistically significant difference?
 What is the magnitude of the difference?
 When finished, take a short break.
Regression
 Basic Correlation
 Looks at the relationship between an independent variable and dependent variable
𝑦 = β0 + β1 𝑋




y is the dependent variable
X is the independent variable
β0 is the intercept
β1 is the slope (also called the coefficient or effect size)
Regression
 Basic Correlation
 Looks at the relationship between an independent variable and dependent variable
𝑦 = β0 + β1 𝑋
 A 1 unit change in X is associated with a β1 change in 𝑦
 Can also be used to predict values within the minimum and maximum of X
 If X = 75 then 𝑦 = β0 + β1 (75)
Excel Example
Regression
Data  Data Analysis  Regression
Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.94284
0.888947
0.88498
25.66225
30
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
28
29
SS
147601.8
18439.43
166041.2
MS
147601.8
658.5511
F
224.1311
Significance F
6.85E-15
Coefficients
26.70054
2.430899
Standard Error
10.08497
0.162374
t Stat
2.647559
14.97101
P-value
0.013163
6.85E-15
Lower 95%
6.042424
2.098292
Upper 95%
47.35866
2.763507
Lower 95.0%
6.042424
2.098292
Upper 95.0%
47.35866
2.763507
Regression
If p > 0.05 then the model is no good.
The goodness of fit for the line.
The most important part.
Regression
This is β0.
This is β1.
𝑦 = 26.701 + 2.431 𝑋
Which is exactly what we got in Excel, but with much more detail.
Regression
Remember the t-tests?
Our t-values = β / SE
Which yields our p-values
for significance.
Regression
Since p < 0.05, X is a significant predictor of Y.
If p > 0.05, then X and Y have no significant relationship.
A 1 unit change in X isn’t likely to affect Y.
Regression
𝑦 = 26.701 + 2.431 𝑋
A 1 unit change in X is associated with a 2.431 increase to Y.
Regression
𝑦 = 26.701 + 2.431 𝑋
If X = 75, what would we expect Y to be?
1. Make sure that 75 is within the minimum and maximum values of X.
Models can fail if trying to predict values outside of observed values.
2. If 75 is within these values, then…
𝑦 = 26.701 + 2.431 75 = 209.026
Regression
300
y = 2.4309x + 26.701
R² = 0.8889
250
200
150
100
50
0
0
20
40
60
80
100
2. If 75 is within these values, then…
𝑦 = 26.701 + 2.431 75 = 209.026
120
Regression
In this example, I’ve built in a ton of variation.
Regression
Results are still significant, but biased. We
know the relationship is Y = 2X, but the extra
variation is throwing off our estimate.
Regression
1200
y = 4.2341x + 424.06
R² = 0.1673
1000
800
600
400
200
0
0
20
40
60
80
100
120
Regression - Exercise
 Open the reg tab in the Exercises workbook and estimate the relationship between SAT and
Freshman GPA (YR1_GPA).
 What is the magnitude of the relationship?
 Is it statistically significant?
 What is the expected Freshman GPA of a student who earned a 1450?
 Try to derive these results both mathematically and using Data Analysis.
 When finished, take a short break.
Multivariate Regression
 Basic Correlation
 Looks at the relationship between multiple independent variables and the dependent variable
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …




y is the dependent variable
𝑋1 is an independent variable
𝑋2 is an independent variable
𝑋3 is an independent variable
Multivariate Regression
 Basic Correlation
 Looks at the relationship between multiple independent variables and the dependent variable
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …




β0 is the intercept
β1 is the slope for 𝑋1
β2 is the slope for 𝑋2
β3 is the slope for 𝑋3
Excel Example (mreg1 & mreg2)
Multivariate Regression
BIVARIATE REGRESSION
MULTIVARIATE REGRESSION
Mean of X
55.00
Mean of X1
55.00
S.D. of X
29.35
S.D. of X1
29.35
Variance of X
861.31
Variance of X1
861.31
Mean of X2
N/A
Mean of X2
50.40
S.D. of X2
N/A
S.D. of X2
28.21
Variance of X2
N/A
Variance of X2
795.77
Mean of Y
160.40
Mean of Y
160.40
S.D. of Y
75.67
S.D. of Y
75.67
Variance of Y
5725.56
Variance of Y
5725.56
The data is exactly the same. The only change is the addition of X2.
Multivariate Regression
BIVARIATE REGRESSION
MULTIVARIATE REGRESSION
I built the model with the formula Y = 2X + Error.
Because we are not controlling for the error, it
makes X biased.
When we control for the source of the bias (an
omitted variable) we get the true estimate of X1.
Multivariate Regression
BIVARIATE REGRESSION
MULTIVARIATE REGRESSION
I build the model with the formula Y = 2X + Error.
This is especially important when the omitted
variable has a large influence on Y or there is a
lot of variation in the variables.
Again, we get the true value of X1 once we
control for the source of the bias.
Multivariate Regression
BIVARIATE REGRESSION
MULTIVARIATE REGRESSION
I build the model with the formula Y = 2X + Error.
As can be seen, if we don’t think statistically, we could get an estimate that is no where near the truth.
Multivariate Regression
 If omitted variables and other sources of bias are so important, what do we need to include in
our models?
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …
GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS
 Use theory and established literature to develop your models.
 Use the most parsimonious model.
 Focus on the TREATMENT variable in evaluation designs.
Multivariate Regression
 Example
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …
GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS
 What should be included as controls?
 Anything that could possibly affect 𝑦
 Non-spuriousness
 Anything that could affect the results of other 𝑋’s
 Independence
Multivariate Regression
 Example
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …
GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS
 What shouldn’t be included as controls?
 Combinations of another 𝑋
 Don’t include SAT Math, SAT Verbal, and SAT Composite because SAT Composite = SAT Math + SAT Verbal
 Combinations of a categorical 𝑋
 Don’t include controls for both Male and Female because if you’re not Male, you’re Female
 In essence, don’t double count your 𝑋 variables
Multivariate Regression
 Example
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …
GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS
 Let’s brainstorm some ideas for GPA.
 SAT Score
 On Campus vs Off Campus Housing
 HS GPA
 Race / Ethnicity
 Pell Recipient
 Gender
 Greek Life
Multivariate Regression
 More about categorical 𝑋 variables
Student ID
Race / Ethnicity
 White
001
White
 Black
002
Black
003
White
004
Asian
005
Hispanic
006
White
007
White
 How do we control for race/ethnicity?
 Hispanic
 Asian
 Other
 Need to select a comparison group
 Tend to select the largest group
 Need to create binary variables for each category
Multivariate Regression
 More about categorical 𝑋 variables
Student
ID
Race /
Ethnicity
White
Black
 Black
001
White
1
0
0
0
0
 Hispanic
002
Black
0
1
0
0
0
003
White
1
0
0
0
0
004
Asian
0
0
0
1
0
005
Hispanic
0
0
1
0
0
006
White
1
0
0
0
0
007
White
1
0
0
0
0
 How do we control for race/ethnicity?
 White
 Asian
 Other
 Comparison Group = White
 Coefficients compare each group to
the selected comparison group
 For example, Hispanic students are associated
with a higher GPA than white students
Hispanic Asian Other
Multivariate Regression
 Example
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …
GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS
GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 HOUSING
+ β7 RACE + β8 GENDER …
 Name your binary variables after the affirmative.
GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS
+ β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE …
Multivariate Regression
 Example
GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS
+ β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE …
 Build Up Approach
Model 1: GPA = β0 + β1 TREATMENT
Add Academic Preparation
Model 2: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA
Model 3: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 BLACK + β5 HISPANIC
+ β6 ASIAN + β7 OTHER + β8 MALE …
Add Race/Ethnicity & Gender
Full Model: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK
+ β6 ONCAMPUS + β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE …
Add Other Student Characteristics
Multivariate Regression
 Example
GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK + β6 ONCAMPUS
+ β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE …
 Tear Down Approach
Full Model: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 PELL + β5 GREEK
+ β6 ONCAMPUS + β7 BLACK + β8 HISPANIC + β9 ASIAN + β10 OTHER + β11 MALE …
Model 2: GPA = β0 + β1 TREATMENT + β2 SAT + β3 HSGPA + β4 GREEK
+ β5 ONCAMPUS + β6 BLACK + β7 HISPANIC + β8 ASIAN + β9 OTHER + β10 MALE …
Multivariate Regression
Model is strong.
Model explains roughly 21% of the variance.
Those receiving the treatment are
associated with a 0.76 increase to
GPA.
Treatment is a statistically
significant predictor of GPA.
Multivariate Regression
Model is strong.
Model now explains roughly 90% of the
variance. A vastly improved model fit.
Multivariate Regression
Treatment is no longer
statistically significant
once controlling for SAT &
HSGPA.
A 100 point increase in SAT scores
is associated with a 0.13 increase
to GPA, holding all else constant.
A 1 point increase in HSGPA is
associated with a 0.60 increase to
GPA, holding all else constant.
But both SAT & HSGPA
are statistically
significant.
Multivariate Regression
This essentially means that treatment had no effect once we control
for previous academic preparation.
Multivariate Regression
Model is strong.
Model explains roughly 89% of the variance.
Multivariate Regression
SAT and HSGPA are largely
unchanged and both still
significant.
This means that race/ethnicity and gender are not strong predictors
of GPA after controlling for previous academic preparation.
You could, arguably, exclude race/ethnicity and gender from the
model if they are not relevant to theory or previous literature.
None of the race/ethnicity or
gender variables are statistically
significant.
Multivariate Regression
Model is strong.
Model explains roughly 91% of the variance.
Multivariate Regression
SAT and HSGPA are largely
unchanged and both still
significant.
This means that SAT and HSGPA are the key predictors of GPA.
None of the controls are statistically
significant.
Multivariate Regression
But look at how far off we could have been…
We could have found that Greeks were associated with a lower
GPA.
Or that living on campus was associated with a higher GPA.
Or that Pell recipients were associated with a lower GPA.
Or that Asian students had a higher GPA than White students.
Multivariate Regression
But what we found was that SAT score and HSGPA were the only aspects that were associated with a
change in GPA.
After controlling for SAT and HSGPA, nothing else significantly impacts GPA.
Multivariate Regression - Example
𝑦 = GPA at Graduation
Key variables of interest
Control variables
This is an easy way to summarize
significance levels. The more
stars, the greater the confidence
level.
Appendix 2a. Results of OLS Regressions,
Graduating GPA as Dependent Variable (Students Entering Fall 2007)
UROP
0.12***
Study Abroad
0.06**
COOP
0.13***
Internship
0.11***
Minor
0.12***
International Plan
0.20
Greek
-0.04
NCAA Athlete
0.26***
Length Lived On-Campus
-0.01
Pell Recipient
-0.04
SAT Math
0.13***
SAT Verbal
0.05***
High School GPA
0.73***
Asian
0.01
Black
-0.11*
Hispanic
-0.07
Other
-0.01
International
0.04
White (comparison group)
--Male
-0.02
GA Resident
-0.01
Intercept
-0.84***
N
2125
Adjusted R-Squared
0.27
* p < 0.05 ** p < 0.01 *** p < 0.001
Actual R2 values will be more in this range, not
0.9 with the constructed data
Multivariate Regression - Example
Appendix 2a. Results of OLS Regressions,
Graduating GPA as Dependent Variable (Students Entering Fall 2007)
UROP
0.12***
Study Abroad
0.06**
COOP
0.13***
Internship
0.11***
Minor
0.12***
International Plan
0.20
Greek
-0.04
NCAA Athlete
0.26***
Length Lived On-Campus
-0.01
Pell Recipient
-0.04
SAT Math
0.13***
SAT Verbal
0.05***
High School GPA
0.73***
Asian
0.01
Black
-0.11*
Hispanic
-0.07
Other
-0.01
International
0.04
White (comparison group)
--Male
-0.02
GA Resident
-0.01
Intercept
-0.84***
N
2125
Adjusted R-Squared
0.27
* p < 0.05 ** p < 0.01 *** p < 0.001
Participation in the undergraduate
research program associated with 0.12
higher GPA at graduation.
What’s potentially wrong with this?
GPA is a requirement for participation in
the program.
Multivariate Regression - Example
Appendix 2a. Results of OLS Regressions,
Graduating GPA as Dependent Variable (Students Entering Fall 2007)
UROP
0.12***
Study Abroad
0.06**
COOP
0.13***
Internship
0.11***
Minor
0.12***
International Plan
0.20
Greek
-0.04
NCAA Athlete
0.26***
Length Lived On-Campus
-0.01
Pell Recipient
-0.04
SAT Math
0.13***
SAT Verbal
0.05***
High School GPA
0.73***
Asian
0.01
Black
-0.11*
Hispanic
-0.07
Other
-0.01
International
0.04
White (comparison group)
--Male
-0.02
GA Resident
-0.01
Intercept
-0.84***
N
2125
Adjusted R-Squared
0.27
* p < 0.05 ** p < 0.01 *** p < 0.001
Participation in the international plan
had no significant relationship with GPA
at graduation.
Multivariate Regression - Example
Appendix 2a. Results of OLS Regressions,
Graduating GPA as Dependent Variable (Students Entering Fall 2007)
UROP
0.12***
Study Abroad
0.06**
COOP
0.13***
Internship
0.11***
Minor
0.12***
International Plan
0.20
Greek
-0.04
NCAA Athlete
0.26***
Length Lived On-Campus
-0.01
Pell Recipient
-0.04
SAT Math
0.13***
SAT Verbal
0.05***
High School GPA
0.73***
Asian
0.01
Black
-0.11*
Hispanic
-0.07
Other
-0.01
International
0.04
White (comparison group)
--Male
-0.02
GA Resident
-0.01
Intercept
-0.84***
N
2125
Adjusted R-Squared
0.27
* p < 0.05 ** p < 0.01 *** p < 0.001
Not surprising to find that academic
preparation was strongly associated with
GPA.
Multivariate Regression - Example
Appendix 2a. Results of OLS Regressions,
Graduating GPA as Dependent Variable (Students Entering Fall 2007)
This is a helpful way to identify
the comparison group for
categorical variables.
UROP
0.12***
Study Abroad
0.06**
COOP
0.13***
Internship
0.11***
Minor
0.12***
International Plan
0.20
Greek
-0.04
NCAA Athlete
0.26***
Length Lived On-Campus
-0.01
Pell Recipient
-0.04
SAT Math
0.13***
SAT Verbal
0.05***
High School GPA
0.73***
Asian
0.01
Black
-0.11*
Hispanic
-0.07
Other
-0.01
International
0.04
White (comparison group)
--Male
-0.02
GA Resident
-0.01
Intercept
-0.84***
N
2125
Adjusted R-Squared
0.27
* p < 0.05 ** p < 0.01 *** p < 0.001
Black students associated with lower GPA
at graduation. Unfortunately, this trend is
prominent in the literature.
Multivariate Regression - Example
Appendix 2a. Results of OLS Regressions,
Graduating GPA as Dependent Variable (Students Entering Fall 2007)
UROP
0.12***
Study Abroad
0.06**
COOP
0.13***
Internship
0.11***
Minor
0.12***
International Plan
0.20
Greek
-0.04
NCAA Athlete
0.26***
Length Lived On-Campus
-0.01
Pell Recipient
-0.04
SAT Math (100s)
0.13***
SAT Verbal (100s)
0.05***
High School GPA
0.73***
Asian
0.01
Black
-0.11*
Hispanic
-0.07
Other
-0.01
International
0.04
White (comparison group)
--Male
-0.02
GA Resident
-0.01
Intercept
-0.84***
N
2125
Adjusted R-Squared
0.27
* p < 0.05 ** p < 0.01 *** p < 0.001
Athletes associated with a higher GPA at
graduation.
Why might this be?
1.
2.
3.
4.
5.
Only athletes who graduate.
Small sample size.
Additional tutoring.
All sports.
Other ideas?
Multivariate Regression - Exercise
 Open the mreg tab in the Exercises workbook and estimate models for end of course
performance in PHYSICS 2102.
 Which variables did you include in your model?
 Which variables are statistically significant?
 What is the magnitude of the variables?
 Which variable has the largest coefficient?
 When finished, take a short break.
Multivariate Regression - Exercise
Multivariate Regression - Exercise
Can’t include every combination of a
categorical variable.
Sets “White” as the comparison group,
which is why everything is 0.
Multivariate Regression - Exercise
Model is strong.
Model fit is strong.
Multivariate Regression - Exercise
Performance in PHYS 2101 is positively associated with performance in PHYS 2102.
A 1 point increase in your PHYS 2101 grade is associated with a 0.65 point increase in your PHYS 2102 grade.
Multivariate Regression - Exercise
High School GPA is positively associated with performance in PHYS 2102.
A 1 point increase in your High School GPA is associated with an 11.19 point increase in your PHYS 2102 grade.
Multivariate Regression - Exercise
Black students perform worse in PHYS 2102 than White students.
Black students are associated with a 3.47 point lower score in PHYS 2102 than White students.
Multivariate Regression - Exercise
Hispanic students perform better in PHYS 2102 than White students.
Hispanic students are associated with a 4.67 point higher score in PHYS 2102 than White students.
Can’t say anything from this model about how well Hispanic students do when compared to Black students.
Multivariate Regression - Exercise
Male students perform better in PHYS 2102 than Female students.
Male students are associated with a 4.25 point higher score in PHYS 2102 than Female students.
Multivariate Regression - Exercise
Tall students are negatively associated with performance in PHYS 2102.
A 1 inch increase in your height is associated with a 0.51 point decrease in your PHYS 2102 grade.
Multivariate Regression - Exercise
This makes no sense! Why would height be correlated with Physics scores?
It’s not. I created this variable to be correlated with gender. We should drop it.
Multivariate Regression - Exercise
 Significant Variables in Full Model






PHYS 2101 Performance (0.65)
High School GPA (11.19)
Black Students (-3.47)
Hispanic Students (4.67)
Male Students (4.25)
Height (-0.51)
Multivariate Regression - Exercise
Performance in PHYS 2101 is positively associated with performance in PHYS 2102.
A 1 point increase in your PHYS 2101 grade is associated with a 0.62 point increase in your PHYS 2102 grade.
Multivariate Regression - Exercise
High School GPA is positively associated with performance in PHYS 2102.
A 1 point increase in your High School GPA is associated with an 10.84 point increase in your PHYS 2102 grade.
Multivariate Regression - Exercise
Asian students perform better in PHYS 2102 than White students (just over α = 0.05).
Asian students are associated with a 3.06 point higher score in PHYS 2102 than White students.
Multivariate Regression - Exercise
Black students perform worse in PHYS 2102 than White students.
Black students are associated with a 3.50 point lower score in PHYS 2102 than White students.
Multivariate Regression - Exercise
Hispanic students perform better in PHYS 2102 than White students.
Hispanic students are associated with a 3.76 point higher score in PHYS 2102 than White students.
Multivariate Regression - Exercise
FULL MODEL
MODEL EXCLUDING HEIGHT
PHYS 2101 Performance (0.65)
PHYS 2101 Performance (0.62)
High School GPA (11.19)
High School GPA (10.84)
Black Students (-3.47)
Black Students (-3.50)
Hispanic Students (4.67)
Hispanic Students (3.76)
Male Students (4.25)
Male Students (NS)
Height (-0.51)
Multivariate Regression - Exercise
 So what’s the right answer?
 I built the data with correlations between PHYS 2102 and:
 PHYS 2101
 SAT & HSGPA (which were correlated)
 Race/Ethnicity
 Gender
 Tutoring was randomly generated
Multivariate Regression - Exercise
 Why was SAT not significant?
 SAT was not significant because it’s highly correlated with HSGPA and that’s what’s capturing the
relationship
Why was gender not significant?
 Because it’s being captured by other variables such as HSGPA and race/ethnicity.
 Why didn’t we include height if it was significant in the full model?
 It has nothing to do with PHSY 2102, it reflects a correlation to gender.
Multivariate Regression - Exercise
Why did we include tutoring if it’s not significant?
 You could drop it, but it’s theoretically significant to the model.
 Not much difference in coefficients. No relationship with PHYS 2102 nor many other variables.
 Want to understand if tutoring was associated with performance in PHYS 2102.
Multivariate Regression - Exercise
 Regression results hold all other variables constant, isolating the impact of only the specific
variable.
 The only thing you are changing is that variable of interest.
 Similar to comparing a male to a female with the exact same HSGPA, SAT, tutoring, race/ethnicity, etc.
Multivariate Regression - Exercise
 So what’s the “right” answer?
 But you won’t know this in the real world.
 The data won’t be constructed.
 Use theory, previous literature, and common sense to
develop your models.
(Don’t include height as a predictor of performance in PHYS 2102)
Multivariate Regression - Exercise
 So what’s the “right” answer?
 But you won’t know this in the real world.
 The data won’t be constructed.
 Use theory, previous literature, and common sense to
develop your models.
 It’s easy to find correlations, but they should be justified.
(Morning gym attendance might be correlated with a higher GPA,
but it’s due to high motivation, not workout behavior.
Don’t force students to go to the gym at 5:30am to try to boost GPA’s.)
Logistic Regression
 Basic Correlation
 Looks at the relationship between multiple independent variables and the dependent variable
Pr 𝑦 = 1 𝑋) = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …
 y is the binary dependent variable (1 = yes 0 = no)
Logistic Regression
 β will yield the increase or decrease in the probability of 𝑦 occurring
 A 1 unit increase in 𝑋1 is associated with a 20% increase in the probability of 𝑦
 𝑦 is now a predicted probability between 0 and 1
 White students with a HSGPA of 3.5 and SAT score of 1450 have a probability of 0.71 of graduating
within 6 years
 Coefficients are difficult to interpret (they are expressed in odds ratios)
 Resort to more likely or less likely without a magnitude
 Not much difference between logistic regression and linear probability estimation when looking
around the mean
 If coefficients don’t vary between the models, use linear probability for simplicity
Logistic Regression
 Let’s brainstorm some examples where you’d need to use logistic regression.
 Retention
 Pass / Fail
 AA Graduation in 2 years
 Grade
 AA Graduation in 3 years
 Yield Rate
 BA Graduation in 4 years
 Major
 BA Graduation in 6 years
 FT / PT
Logistic Regression
79% retention rate
69% in-state students
53% female
Linear Probability Model
Linear Probability Model
A 100 point increase in SAT scores are associated with an increase in the probability of retaining by 13%.
In-state students are 26% more likely than out-of-state students to be retained.
Linear Probability Model
An out-of-state, Asian, male student with a SAT score of 1250 and HS GPA of 3.0
would have what probability of being retained?
Pr 𝑦 = 1 𝑋) = −1.56 + 0.00132 1250 + 0.132 3.0 + 0.259 0 − 0.048 1 − 0.026 0 − 0.006 0 + 0.064 0 − 0.045 (1)
Pr 𝑦 = 1 𝑋) = 0.39
Linear Probability Model
An in-state, white, female student with a SAT score of 1500 and HS GPA of 3.9
would have what probability of being retained?
Pr 𝑦 = 1 𝑋) = −1.56 + 0.00132 1500 + 0.132 3.9 + 0.259 1 − 0.048 0 − 0.026 0 − 0.006 0 + 0.064 0 − 0.045 (0)
Pr 𝑦 = 1 𝑋) = 1.19
Linear Probability Model
Pr 𝑦 = 1 𝑋) = 1.19
Doesn’t make sense. Can’t have a probability of over 1.
Linear Probability Models fail when looking at the extremes (high SAT & GPA).
Logistic Regression
The problem with logistic regression is that the coefficients are difficult to interpret.
From:
Pr 𝑦 = 1 𝑋) = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + …
To:
Pr 𝑦 = 1 𝑋) =
1
1 + 𝑒 −(β0+ β1 𝑋1 + β2 𝑋2+ β3 𝑋3+ …)
Or, more specifically:
1
Pr 𝑌𝑖 = 𝑦𝑖 𝑋𝑖 ) =
1 + 𝑒 −β 𝑋𝑖
𝑦𝑖
1
1
1 + 𝑒 −β 𝑋𝑖
1−𝑦𝑖
Logistic Regression
There are two ways of interpreting logistic regression.
 Maximum Likelihood
 Odds Ratios
Logistic Regression
Maximum Likelihood
 If positive, more likely
 Higher SAT associated with higher probability of retaining
 Higher HS GPA associated with higher probability of retaining
 If negative, less likely
 Asian students less likely than Whites to be retained
 Males less likely than females to be retained
 Magnitudes are exponential estimates
 Can’t say much about how 𝑋 affects 𝑦
 Non-linear. No constant β. Have to identify values.
White, in-state, female with 1500 and 3.9
Pr 𝑦 = 1 𝑋) =
𝑒 −19.75+0.015 1500 +0.405 3.9 +2.25 (1)
𝑒 −19.75+0.015 1500 +0.405 3.9 +2.25 (1) +1
= 0.99
Logistic Regression
Maximum Likelihood
 If positive, more likely
 Higher SAT associated with higher probability of retaining
 Higher HS GPA associated with higher probability of retaining
 If negative, less likely
 Asian students less likely than Whites to be retained
 Males less likely than females to be retained
 Magnitudes are exponential estimates
 Can’t say much about how 𝑋 affects 𝑦
 Non-linear. No constant β. Have to identify values.
Asian, out-of-state, male with 1250 and 3.0
Pr 𝑦 = 1 𝑋) =
𝑒 −19.75+0.015 1250 +0.405 3.0 −0.78 1 −1.11 (1)
𝑒 −19.75+0.015 1250 +0.405 3.0 −0.78 1 −1.11 (1) +1
= 0.19
Logistic Regression
Odds Ratios
 If >1, more likely
 Higher SAT associated with higher probability of retaining
 Higher HS GPA associated with higher probability of retaining
 If <1, less likely
 Asian students less likely than Whites to be retained
 Males less likely than females to be retained
 Direction of the association is the same
 Non-linear. No constant β. Have to identify values.
Linear Probability - Example
𝑦 = Pr(Graduating)
Key variables of interest
Control variables
Appendix 4a. Results of Linear Probability,
Graduation within 6 Years as Dependent Variable (Students Entering Fall 2007)
UROP
Study Abroad
COOP
Internship
Minor
International Plan
Greek
NCAA Athlete
Length Lived On-Campus
Pell Recipient
SAT Math
SAT Verbal
High School GPA
Asian
Black
Hispanic
Other
International
White (comparison group)
Male
GA Resident
Intercept
N
Adjusted R-Squared
* p < 0.05 ** p < 0.01 *** p < 0.001
0.14***
0.10***
0.11***
0.12***
0.11***
0.06
0.10***
0.07
0.08***
0.02
0.03*
-0.02*
0.17***
0.01
-0.04
-0.08*
-0.08
-0.02
---0.01
0.07***
-0.21
2611
0.21
GPA is still a requirement for
participation many of these programs,
which is correlated to probability of
graduating.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Undergraduate
research and study
abroad associated
with higher
probabilities of
graduating.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Students who
participate in a
co-op program
were less likely to
graduate within 4
years but more
likely to graduate
within 6 years.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Students who
participate in an
internship or who
declare a minor
were more likely to
graduate within 6
years, but had no
impact on
graduation within 4
years.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
As with GPA, the
international plan
program had no
impact on
probability of
graduation.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Students who
participate in Greek
activities were
more likely to
graduate within 6
years.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Student athletes
were more likely to
graduate within 4
years, but no more
likely to graduate
within 6 years.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
The longer a
student lives on
campus, the more
likely they are to
graduate within 4
or 6 years.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Pell recipients and
men less likely to
graduate within 4
years.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Students with
strong academic
preparation more
likely to graduate
within 4 or 6 years.
Though those with
high SAT Verbal
scores slightly less
likely to graduate
within 6 years?
Tech.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
Hispanic students
less likely to
graduate within 6
years when
compared to White
students.
Linear Probability - Example
Appendix 3a & 4a. Results of Linear Probability (Students Entering Fall 2007),
Graduation within 4 Years as Dependent Variable
Graduation within 6 Years as Dependent Variable
UROP
0.13***
UROP
0.14***
Study Abroad
0.06**
Study Abroad
0.10***
COOP
-0.27***
COOP
0.11***
Internship
-0.00
Internship
0.12***
Minor
0.04
Minor
0.11***
International Plan
-0.06
International Plan
0.06
Greek
-0.00
Greek
0.10***
NCAA Athlete
0.25***
NCAA Athlete
0.07
Length Lived On-Campus
0.04***
Length Lived On-Campus
0.08***
Pell Recipient
-0.07**
Pell Recipient
0.02
SAT Math
0.05**
SAT Math
0.03*
SAT Verbal
0.01
SAT Verbal
-0.02*
High School GPA
0.38***
High School GPA
0.17***
Asian
0.05
Asian
0.01
Black
-0.02
Black
-0.04
Hispanic
-0.07
Hispanic
-0.08*
Other
0.03
Other
-0.08
International
0.05
International
-0.02
White (comparison group)
--White (comparison group)
--Male
-0.08***
Male
-0.01
GA Resident
-0.01
GA Resident
0.07***
Intercept
-1.53***
Intercept
-0.21
N
2611
N
2611
Adjusted R-Squared
0.15
Adjusted R-Squared
0.21
* p < 0.05 ** p < 0.01 *** p < 0.001
* p < 0.05 ** p < 0.01 *** p < 0.001
In-state students
more likely to
graduate within 6
years.
Linear Probability - Example
 Takeaways
 Most programs are successful in increasing probabilities of graduation
 Co-Op delays graduation (1 year program)
 International Plan has no impact on graduation
 Requirements to be admitted to the programs suggest that these bright students would likely have graduated on time anyhow
 Academic preparation, engagement activities, and student characteristics are all associated with the
probability of graduating on time
Logistic Regression - Exercise
 Open the lot tab in the Exercises workbook and estimate models for graduation in 4 years using
both the linear probability model and the logistic regression model.
 Interpret the coefficients and statistical significance in the linear probability model.
 What is the probability of graduating within 4 years in the linear probability model for a student
with an Academic Integration Index and Social Integration Index at their respective means?
 In the logistic regression model, are students with a greater Academic Integration Index score
more or less likely to graduate within 4 years?
 In the logistic regression model, are students with a greater Social Integration Index score more
or less likely to graduate within 4 years?
 What is the probability of graduating within 4 years in the logistic regression model for a
student with an Academic Integration Index and Social Integration Index at their respective
means?
Logistic Regression - Exercise
Academic Integration Index mean is 34.24
Social Integration Index mean is 24.83.
Logistic Regression - Exercise
A 1 unit increase in the Academic Integration Index is associated with a 0.7% increase in the probability of
graduating within 4 years, holding all else constant.
A 1 unit increase in the Social Integration Index is associated with a 1.2% increase in the probability of
graduating within 4 years, holding all else constant.
Logistic Regression - Exercise
A student with an Academic Integration Index score of 34.24 and Social Integration Index score of 24.83
would be predicted to have a 0.40 probability of graduating within 4 years.
The linear regression at the means of the 𝑋 variables give you the mean of the 𝑦 variable.
Logistic Regression - Exercise
A higher Academic Integration Index is associated with a greater probability of graduating within 4 years,
holding all else constant.
A higher Social Integration Index is associated with a greater probability of graduating within 4 years,
holding all else constant.
Logistic Regression - Exercise
A student with an Academic Integration Index score of 34.24 and Social Integration Index score of 24.83
would be predicted to have a 0.85 probability of graduating within 4 years.
This is much higher than the 0.4 with the linear regression. Those at or above the mean much more likely
to graduate within 4 years than those below the mean.
Break
Propensity Score Matching
TREATMENT GROUP
CONTROL GROUP












Propensity Score Matching
TREATMENT GROUP
CONTROL GROUP
TREATMENT EFFECT
 = 82
 = 79
3
 = 86
 = 82
4
 = 85
 = 81
4
 = 82
 = 80
2
 = 81
 = 79
2
 = 80
 = 78
2
2.83
Propensity Score Matching
 The question with matching is how you determine a good match.
 Develop propensity scores (weighted values using logistic regression on a series of variables)
 Match those with close propensity scores
 1-to-1 match with replacement
 1-to-1 match without replacement
 Multiple matches
 Many more varieties
 Compare matches on outcome OR use matches in multiple regression
Propensity Score Matching
 Multivariate Regression
GPA = β0 + β1 TREATMENT + β2 ACADEMIC_PREP + β3 CONTROLS
 Propensity Scores
Pr(TREATMENT) = β0 + β1 ACADEMIC_PREP + β2 CONTROLS
 Where the 𝑋 variables are characteristics and variables associated with whether they chose to
participate in the treatment
 The 𝑋 variables cannot be associated with the eventual outcome, only with the treatment
 Develops Pr(TREATMENT) for each individual based on this regression and their characteristics
Propensity Score Matching
Pr(TREATMENT) = β0 + β1 ACADEMIC_PREP + β2 CONTROLS
 The Pr(TREATMENT) for each individual is their propensity score – the propensity that they
participated in the treatment
 Use this score to find a match to someone with a similar propensity, but who didn’t participate
in the treatment
 Not everyone will have a match. Analysis limited to students who overlap.
 If everyone with a SAT score over 1500 participated, there would be no students left to compare against who didn’t participate.
 Only matches based on observed characteristics. Omitted variable bias still possible in developing
matches.
Propensity Score Matching
 Matching
 Using the propensity scores (Pr(TREATMENT))
 Nearest Neighbor – find closest match
TREATMENT GROUP
0.599794
0.726072
0.827099
0.724683
0.551767
0.029345
0.279154
0.686264
0.006545
0.946441
CONTROL GROUP
0.325386
0.585769
0.543578
0.076511
0.027491
0.922861
0.861053
0.995708
0.611816
0.590846
Propensity Score Matching
 Matching
 Using the propensity scores (Pr(TREATMENT))
 Nearest Neighbor – find closest match
 Caliper – find closest match within a given band
 Caliper <= 0.01
 The rest have no overlap (no matches)
TREATMENT GROUP
0.599794
0.726072
0.827099
0.724683
0.551767
0.029345
0.279154
0.686264
0.006545
0.946441
CONTROL GROUP
0.325386
0.585769
0.543578
0.076511
0.027491
0.922861
0.861053
0.995708
0.611816
0.590846
Propensity Score Matching
 Matching
TREATMENT GROUP
0.599794
 Using the propensity scores (Pr(TREATMENT))
0.726072
 Nearest Neighbor – find closest match
0.827099
 Caliper – find closest match within a given band
0.724683
0.551767
 Replacement
0.029345
 With replacement – multiple treatment group individuals
0.279154
can be matched to the same control group individual
0.686264
0.006545
0.946441
CONTROL GROUP
0.325386
0.585769
0.543578
0.076511
0.027491
0.922861
0.861053
0.995708
0.611816
0.590846
Propensity Score Matching
 Matching
TREATMENT GROUP
0.599794
 Using the propensity scores (Pr(TREATMENT))
0.726072
 Nearest Neighbor – find closest match
0.827099
 Caliper – find closest match within a given band
0.724683
0.551767
 Replacement
0.029345
 With replacement – multiple treatment group individuals
0.279154
can be matched to the same control group individual
0.686264
 Without replacement – control group individual is matched
0.006545
to single treatment individual
0.946441
 Closest match – match closest treatment and control group scores
CONTROL GROUP
0.325386
0.585769
0.543578
0.076511
0.027491
0.922861
0.861053
0.995708
0.611816
0.590846
Propensity Score Matching
 Matching
TREATMENT GROUP
0.006545
 Using the propensity scores (Pr(TREATMENT))
0.029345
 Nearest Neighbor – find closest match
0.279154
 Caliper – find closest match within a given band
0.551767
0.599794
 Replacement
0.686264
 With replacement – multiple treatment group individuals
0.724683
can be matched to the same control group individual
0.726072
 Without replacement – control group individual is matched
0.827099
to single treatment individual
0.946441
 Closest match – match closest treatment and control group scores
 First match – treatment scores are sorted and matched 1-to-1 to a control
individual
CONTROL GROUP
0.027491
0.076511
0.325386
0.543578
0.585769
0.590846
0.611816
0.861053
0.922861
0.995708
Propensity Score Matching
 Matching
 Using the propensity scores (Pr(TREATMENT))
 Nearest Neighbor – find closest match
 Caliper – find closest match within a given band
 Replacement
 With replacement – multiple treatment group individuals
can be matched to the same control group individual
 Without replacement – control group individual is matched
to single treatment individual
 Closest match – match closest treatment and control group scores
 First match – treatment scores are sorted and matched 1-to-1 to a control
individual
My preferred model.
 Ensures close, accurate matches.
 Eliminates individuals with no match /
poor matches.
 But make sure one control observation
does not dominate the matches.
Propensity Score Matching
Those who participated in a test prep program had an SAT score that was roughly 400 points higher than
those who did not.
Propensity Score Matching
But look at how much higher the GPA was for students who participated in the test prep program, maybe
this is what was causing the difference in SAT scores.
Propensity Score Matching
Could only find 7 matches out of 150 people.
This means that the treatment and control group differed a lot.
Lack of overlap.
Propensity Score Matching
For these 7 matches, there was only a 191 point difference in SAT scores.
Much different than the 400 point difference observed earlier.
Propensity Score Matching
This time we have the opposite.
There is a 294 point difference, but the student demographics are fairly equivalent, including GPA.
This means there should be a lot more matches.
Propensity Score Matching
Not only did 40/40 = 100% of the treatment population have matches,
But there was a significant difference of 328.8.
This is 34 points higher (12%) than just comparing participants to non-participants.
Propensity Score Matching - Example
Probability of Graduating within 6 Years
(Linear Probability)
GPA at Graduation (Multivariate Regression)
UROP
Study Abroad
COOP
Internship
Minor
International Plan
Greek
NCAA Athlete
Length Lived On-Campus
Pell Recipient
SAT Math
SAT Verbal
High School GPA
Asian
Black
Hispanic
Other
International
White (comparison group)
Male
GA Resident
Intercept
N
Adjusted R-Squared
* p < 0.05 ** p < 0.01 *** p < 0.001
0.12***
0.06**
0.13***
0.11***
0.12***
0.20
-0.04
0.26***
-0.01
-0.04
0.13***
0.05***
0.73***
0.01
-0.11*
-0.07
-0.01
0.04
---0.02
-0.01
-0.84***
2125
0.27
Both models could be
biased because there
are requirements to
be admitted to these
programs.
UROP
Study Abroad
COOP
Internship
Minor
International Plan
Greek
NCAA Athlete
Length Lived On-Campus
Pell Recipient
SAT Math
SAT Verbal
High School GPA
Asian
Black
Hispanic
Other
International
White (comparison group)
Male
GA Resident
Intercept
N
Adjusted R-Squared
* p < 0.05 ** p < 0.01 *** p < 0.001
0.14***
0.10***
0.11***
0.12***
0.11***
0.06
0.10***
0.07
0.08***
0.02
0.03*
-0.02*
0.17***
0.01
-0.04
-0.08*
-0.08
-0.02
---0.01
0.07***
-0.21
2611
0.21
Propensity Score Matching - Example
Appendix 1b. Logistic Regressions to Develop Propensity Scores
UROP
Study Abroad
COOP
Greek
-0.15
0.8
0.30
NCAA Athlete
-1.56
-3.23
Length Lived On-Campus
0.17
0.16
Pell Recipient
0.45
-0.30
SAT Math
0.37
0.22
0.26
SAT Verbal
0.23
0.14
-0.14
High School GPA
0.67
0.53
1.26
Asian
0.37
-0.16
-0.28
Black
-0.58
Hispanic
0.38
0.33
Other
0.46
International
0.54
White (comparison group)
Male
-0.41
-0.56
0.46
GA Resident
-0.37
-0.30
ROC
0.66
0.67
0.64
GOF
0.66
0.45
0.74
Note: Only coefficients significant at the p < 0.001 level are included.
Internship
0.49
0.20
0.64
0.31
0.78
0.27
0.51
0.53
-0.22
0.64
0.16
Minor
Int’l Plan
0.13
0.17
0.42
0.37
-0.40
0.23
0.64
0.07
0.71
0.70
0.41
These programs are now the dependent variable. We’re estimating probabilities of participating in these
programs given student characteristics.
Propensity Score Matching - Example
Appendix 1b. Logistic Regressions to Develop Propensity Scores
UROP
Study Abroad
COOP
Greek
-0.15
0.8
0.30
NCAA Athlete
-1.56
-3.23
Length Lived On-Campus
0.17
0.16
Pell Recipient
0.45
-0.30
SAT Math
0.37
0.22
0.26
SAT Verbal
0.23
0.14
-0.14
High School GPA
0.67
0.53
1.26
Asian
0.37
-0.16
-0.28
Black
-0.58
Hispanic
0.38
0.33
Other
0.46
International
0.54
White (comparison group)
Male
-0.41
-0.56
0.46
GA Resident
-0.37
-0.30
ROC
0.66
0.67
0.64
GOF
0.66
0.45
0.74
Note: Only coefficients significant at the p < 0.001 level are included.
Internship
0.49
0.20
0.64
0.31
0.78
0.27
0.51
0.53
-0.22
0.64
0.16
Minor
Int’l Plan
0.13
0.17
0.42
0.37
-0.40
0.23
0.64
0.07
0.71
0.70
0.41
Greeks very involved. Athletes don’t have time to study abroad. Academic preparation linked to most
programs. Etc.
Propensity Score Matching - Example
Appendix 1b. Logistic Regressions to Develop Propensity Scores
UROP
Study Abroad
COOP
Greek
-0.15
0.8
0.30
NCAA Athlete
-1.56
-3.23
Length Lived On-Campus
0.17
0.16
Pell Recipient
0.45
-0.30
SAT Math
0.37
0.22
0.26
SAT Verbal
0.23
0.14
-0.14
High School GPA
0.67
0.53
1.26
Asian
0.37
-0.16
-0.28
Black
-0.58
Hispanic
0.38
0.33
Other
0.46
International
0.54
White (comparison group)
Male
-0.41
-0.56
0.46
GA Resident
-0.37
-0.30
ROC
0.66
0.67
0.64
GOF
0.66
0.45
0.74
Note: Only coefficients significant at the p < 0.001 level are included.
Internship
0.49
0.20
0.64
0.31
0.78
0.27
0.51
0.53
-0.22
0.64
0.16
Minor
Int’l Plan
0.13
0.17
0.42
0.37
-0.40
0.23
0.64
0.07
0.71
0.70
0.41
Receiver Operating Characteristic Curve (ROC) and Goodness of Fit (GOF) are estimates of model fit.
Propensity Score Matching - Example
Appendix 1b. Logistic Regressions to Develop Propensity Scores
UROP
Study Abroad
COOP
Greek
-0.15
0.8
0.30
NCAA Athlete
-1.56
-3.23
Length Lived On-Campus
0.17
0.16
Pell Recipient
0.45
-0.30
SAT Math
0.37
0.22
0.26
SAT Verbal
0.23
0.14
-0.14
High School GPA
0.67
0.53
1.26
Asian
0.37
-0.16
-0.28
Black
-0.58
Hispanic
0.38
0.33
Other
0.46
International
0.54
White (comparison group)
Male
-0.41
-0.56
0.46
GA Resident
-0.37
-0.30
ROC
0.66
0.67
0.64
GOF
0.66
0.45
0.74
Note: Only coefficients significant at the p < 0.001 level are included.
Internship
0.49
0.20
0.64
0.31
0.78
0.27
0.51
0.53
-0.22
0.64
0.16
Minor
Int’l Plan
0.13
0.17
0.42
0.37
-0.40
0.23
0.64
0.07
These scores are really low. Suggests the models are not strong and matches will likely be poor.
0.71
0.70
0.41
Propensity Score Matching - Example
Tables 2, 3. Probabilities of Participation
UROP
Greek
NCAA Athlete
Length Lived On-Campus
Pell Recipient
SAT Math
SAT Verbal
High School GPA
Asian
Black
Hispanic
Other
International
White (comparison group)
Male
GA Resident
ROC
GOF
Study Abroad
↓
--↑
↑
↑
↑
↑
↑
--↑
↑
↑
↑
↓
↑
↓
↑
↑
↑
↓
↓
↑
-----
↓
--0.66
0.66
↓
↓
0.67
0.45
Another way to present probabilities.
COOP
↑
↓
----↑
↓
↑
↓
---------
Internship
↑
--↑
↑
↑
--↑
↑
↑
↑
-----
(Comparison Group)
↑
↓
↓
--0.64
0.64
0.74
0.16
Minor
Int’l Plan
----↑
--↑
↑
↑
-----------
----------↑
-------------
↓
↑
0.64
0.07
----0.70
0.41
Propensity Score Matching - Example
 Each individual’s characteristics are then applied to this regression to develop propensity scores
for the probability of their participation
 These propensity scores are then matched to non-participants
Propensity Score Matching - Example
Appendix 1a. Results of Propensity Score Matching
UROP
Study Abroad
COOP
UROP
Study Abroad
COOP
GPA
0.16***
0.14***
0.12***
4-Year Grad
0.11***
0.07***
-0.24***
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2939
3319
3091
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2684
4239
4125
6-Year Grad
Treated
UROP
0.14***
4045
Study Abroad
0.19***
4725
COOP
0.17***
4351
Note: Models for Internships, Minors, and the International Plan
Control
Matches
17,650
3684
16,970
4239
17,344
4125
fail measures of model fit.
Now, when matching participants to non-participants, we see that
 UROP associated with higher GPAs and probability of graduating within 4 or 6 years.
 Study Abroad associated with higher GPAs and probability of graduating within 4 or 6 years.
 COOP associated with higher GPAs probability of graduating within 6 years.
 But lower probability of graduating within 4 years.
Propensity Score Matching - Example
Summarized Results
GPA
UROP
Study Abroad
COOP
Regression
0.12***
0.06**
0.13***
Linear Probability
PSM
0.16***
0.14***
0.12***
0.13***
0.06**
-0.27***
0.11***
0.07***
-0.24***
4-Year Grad
UROP
Study Abroad
COOP
6-Year Grad
UROP
Study Abroad
COOP
Note: Models for Internships, Minors, and
measures of model fit.
0.14***
0.14***
0.10***
0.19***
0.11***
0.17***
the International Plan fail
The coefficients are really close between the different models. That’s a good sign that the estimates are
relatively accurate.
If they differed greatly, there could be bias in the regressions or poor matches.
Time Series Analysis
 Regression, Logistic Regression, and Propensity Score Matching all assume there is no time
trend
 Participants and Non-Participants must be from the same cohort
 Only one time period in the data
 Time series analysis uses a type of regression to look for changes over time
 Time must be an independent variable
 Time series analysis is also known as longitudinal analysis
 Tracking the same individual over time is called panel analysis
 The type of panel analysis I’m going to teach is called Fixed Effects
Time Series Analysis
 Multiple Regression
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … + ε
 Time Series Analysis
𝑦 = β0 + β1 𝑋1𝑖𝑡 + β2 𝑋2𝑖𝑡 + β3 𝑋3𝑖𝑡 + … + α𝑖 + δ𝑡 + ε𝑖𝑡





ε is an error component
α controls for individuals
δ controls for time
𝑖 is the identifier per individual
t is the identifier per time period
Time Series Analysis
 Multiple Regression
𝑦 = β0 + β1 𝑋1 + β2 𝑋2 + β3 𝑋3 + … + ε
 Time Series Analysis
𝑦 = β0 + β1 𝑋1𝑖𝑡 + β2 𝑋2𝑖𝑡 + β3 𝑋3𝑖𝑡 + … + α𝑖 + δ𝑡 + ε𝑖𝑡
 Remember how categorical variables were transformed into binary variables?
 We’re essentially doing the same for each time and individual.
 Year = 1999
 Year1997 = 0
Year1998 = 0
Year1999 = 1
 ID = 103
 ID102 = 0
ID103 = 1
ID104 = 0 …
Year2000 = 0 …
Time Series Analysis
 Time Series Analysis
𝑦 = β0 + β1 𝑋1𝑖𝑡 + β2 𝑋2𝑖𝑡 + β3 𝑋3𝑖𝑡 + … + α𝑖 + δ𝑡 + ε𝑖𝑡
 In essence, this looks for changes over time for a given individual
 Things that do not vary over time get dropped from the analysis
 A $100,000 increase in spending on student services is associated with a 2 percentage point
increase in graduation rates
Time Series Analysis
Basic Time Series
25
20
15
10
5
0
2003
2005
2007
2009
2011
2013
2015
Time Series Analysis
Basic Time Series
25
20
15
y = 1.1225x - 2244.7
R² = 0.3691
10
5
0
2003
2005
2007
2009
2011
2013
2015
Time Series Analysis
Time Series with Controls for Individual
25
y = 4x - 8035
y = 3.5x - 7008.5
20
y = 3x - 6022
15
y = 2x - 4006
10
y = 2x - 4012
5
0
2003
2005
2007
2009
2011
2013
2015
Time Series Analysis - Example
Each column is associated with a different dependent variable (expenditures).
Time Series Analysis - Example
Within an institution, a $100 increase in state appropriations per FTE is associated with a
$26 increase in instruction after controlling for other sources of revenues and time trends.
Time Series Analysis - Example
More realistically, a $100 DECREASE in state appropriations per FTE is associated with a
$26 DECREASE in instructional expenses.
Time Series Analysis - Example
Not surprisingly, a strong link between tuition revenue and instructional expenses.
Time Series Analysis - Example
A similarly strong link between grants and contracts with research expenses.
Time Series Analysis - Example
Virtually no relationship between spending and retention rates.
Time Series Analysis - Example
Few relationships between spending and graduation rates (and marginal significance).
Time Series Analysis - Example
Increasing expenses for scholarships and fellowships reduced 4-year graduation rates.
Time Series Analysis - Example
Increasing tuition increases the 6-year graduation rate.
Time Series Analysis - Example
Increasing tuition or reducing scholarships/fellowships
may motivate students to graduate faster so they
don’t have to pay more.
But probably not a good policy idea.
Time Series Analysis - Example
 Takeaways
 Time Series Analysis is very similar to OLS regression
 But it goes another step to control for time trends
 And it looks at changes within a unit of analysis over time
 This helps to move from a simple correlation
 (E.g. Institutions with large enrollments are associated with larger numbers of administrators)
Time Series Analysis - Example
 Takeaways
 Time Series Analysis is very similar to OLS regression
 But it goes another step to control for time trends
 And it looks at changes within a unit of analysis over time
 This helps to move from a simple correlation
 To a more robust analysis of changes for a unit over time
 (E.g. As enrollment increases by 100 students for an institution, staff/administration is expected to increase by 2)
Time Series Analysis - Example
 Takeaways
 Time Series Analysis is very similar to OLS regression
 But it goes another step to control for time trends
 And it looks at changes within a unit of analysis over time
 This helps to move from a simple correlation
 To a more robust analysis of changes for a unit over time
 This helps control for cross institutional differences
 (E.g. Looking at changing enrollment patterns for an institution rather than comparing enrollment patterns from institutions of
different types or populations)
Break
Interpretation and Application
Interpretation and Application
 What are some examples of findings we’ve discovered?
 Correlational
 A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables.
 A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant.
 A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other
variables.
 Quasi-Experimental
 Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6
years when compared to non-participants.
 Those who participate in study abroad were associated with higher earning at graduation as compared to non-participants.
Interpretation and Application
 Which technique matches to each of these?
(Regression, Logistic Regression, Propensity Score Matching, or Time Series)
 Correlational
 A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables. REGRESSION.
 A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant. TIME SERIES.
 A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other
variables. LOGISTIC REGRESSION.
 Quasi-Experimental
 Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6
years when compared to non-participants. PSM.
 Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants. PSM.
Interpretation and Application
 What are some key phrases to use and understand in the interpretation and application?
“Is Associated”
 Correlational
 A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables.
 A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant.
 A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other
variables.
 Quasi-Experimental
 Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6
years when compared to non-participants.
 Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants.
We can’t say anything about causation, so we can’t use causal language. We can’t say X caused Y.
Instead, say that changing X is associated with a change in Y.
Interpretation and Application
 What are some key phrases to use and understand in the interpretation and application?
“After controlling for…” or “Holding all else constant”
 Correlational
 A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables.
 A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant.
 A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other
variables.
 Quasi-Experimental
 Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6
years when compared to non-participants.
 Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants.
There may be other variables that were unaccounted for, so we have to say that the observed
coefficient is based on controlling for these certain variables.
Interpretation and Application
 What are some key phrases to use and understand in the interpretation and application?
“When compared to non-participants”
 Correlational
 A 100 point increase in SAT scores is associated with a 0.12 higher freshman GPA after controlling for all other variables.
 A 1% increase in state appropriations is associated with a 0.8% increases in instructional expenses holding all else constant.
 A 0.1 increase in high school GPA is associated with a 9% increase in the probability of retaining after controlling for all other
variables.
 Quasi-Experimental
 Participation in the undergraduate research program is associated with a 14% increase in the probability of graduating within 6
years when compared to non-participants.
 Those who participate in study abroad were associated with higher earnings at graduation as compared to non-participants.
Just like with “holding all else constant”, we have to specify that these results are only based on
our comparison to who we determine to be participants and non-participant matches.
Interpretation and Application
 What’s the difference between statistical significance and practical significance?
 Statistical Significance
 Basically just a p-value of less than 0.05
 There is a relationship between the independent variable (𝑋) and the dependent variable (𝑦)
 Practical Significance
 This varies by variable, it looks at the magnitude of β
 Most magnitudes in social sciences and behavioral studies will be between 0-5%
 Some magnitudes are not feasible
 Example
 A 3 percentage point increase in overall graduation rates is a major difference
 Moving from 78% graduating within 4 years to 81% graduating within 4 years
Interpretation and Application
 What’s the difference between statistical significance and practical significance?
 Statistical Significance
 Basically just a p-value of less than 0.05
 There is a relationship between the independent variable (𝑋) and the dependent variable (𝑦)
 Practical Significance
 This varies by variable, it looks at the magnitude of β
 Most magnitudes in social sciences and behavioral studies will be between 0-5%
 Some magnitudes are not feasible
 Example
 A 3 percentage point increase in overall graduation rates is a major difference
 A 3 percentage point increase a student’s probability of graduation is small
 A person with a 91% probability of graduating within 4 years to 94% probability is small
Interpretation and Application
 What’s the difference between statistical significance and practical significance?
 Statistical Significance
 Basically just a p-value of less than 0.05
 There is a relationship between the independent variable (𝑋) and the dependent variable (𝑦)
 Practical Significance
 This varies by variable, it looks at the magnitude of β
 Most magnitudes in social sciences and behavioral studies will be between 0-5%
 Some magnitudes are not feasible
 Example
 A 3 percentage point increase in overall graduation rates is a major difference
 A 3 percentage point increase a student’s probability of graduation is small
 A $10,000 increase in funding per FTE is associated with a 5% increase to public service participation
 Doesn’t make sense if current funding levels are $6,000 per FTE
 Is a 5% increase to public service participation worth the extra funding?
Interpretation and Application
 What about translating into non-statistical speak?




Be careful about not over-stating your findings
Consider both the statistical and practical signficance
Keep it simple and straightforward in the executive summary
Add the details and statistical jargon in an appendix
Interpretation and Application
 What about translating into non-statistical speak?
Appendix 1a. Results of Propensity Score Matching
UROP
Study Abroad
COOP
UROP
Study Abroad
COOP
GPA
0.16***
0.14***
0.12***
4-Year Grad
0.11***
0.07***
-0.24***
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2939
3319
3091
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2684
4239
4125
6-Year Grad
Treated
UROP
0.14***
4045
Study Abroad
0.19***
4725
COOP
0.17***
4351
Note: Models for Internships, Minors, and the International Plan
Control
Matches
17,650
3684
16,970
4239
17,344
4125
fail measures of model fit.
 “The results of this study indicate that the Undergraduate Research Opportunities Program (UROP),
study abroad, and Co-Op programs are all very successful in improving student outcomes when
comparing participants to non-participants of a similar profile.”
Interpretation and Application
 So we should push all of our students to participate in these programs?
Appendix 1a. Results of Propensity Score Matching
UROP
Study Abroad
COOP
UROP
Study Abroad
COOP
GPA
0.16***
0.14***
0.12***
4-Year Grad
0.11***
0.07***
-0.24***
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2939
3319
3091
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2684
4239
4125
6-Year Grad
Treated
UROP
0.14***
4045
Study Abroad
0.19***
4725
COOP
0.17***
4351
Note: Models for Internships, Minors, and the International Plan
Control
Matches
17,650
3684
16,970
4239
17,344
4125
fail measures of model fit.
 Not necessarily. While these programs likely helped the students that participated, there are still a
number of factors that could be preventing unbiased results.
 This only says that for participants, they likely did better than they would have had they not
participated.
Interpretation and Application
 So we should push all of our students to participate in these programs?
Appendix 1a. Results of Propensity Score Matching
UROP
Study Abroad
COOP
UROP
Study Abroad
COOP
GPA
0.16***
0.14***
0.12***
4-Year Grad
0.11***
0.07***
-0.24***
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2939
3319
3091
Treated
4045
4725
4351
Control
17,650
16,970
17,344
Matches
2684
4239
4125
6-Year Grad
Treated
UROP
0.14***
4045
Study Abroad
0.19***
4725
COOP
0.17***
4351
Note: Models for Internships, Minors, and the International Plan
Control
Matches
17,650
3684
16,970
4239
17,344
4125
fail measures of model fit.
 Statistical analyses are about average effects, not the effect for any one individual. Most students
benefitted (on average) while there may have been others that did not fare as well.
 Some students may not benefit from any or all of these programs.
Interpretation and Application
 Thoughts on when to use a table versus a graph?




Most people don’t know how to interpret regression coefficients
But for those that do, it provides all the statistical details
Graphs can easily display a lot of information
Graphs are very helpful when looking at trends over time
Interpretation and Application
 When to use a table versus a graph.






Table – when coefficients and statistical significance is most important
Line Graph – when looking at continuous changes over time
Bar Graph – when looking at categorical data
Pie Graph – when looking at proportions of a whole
Scatterplot – when graphing raw data and the best-fit line
Interactive Graph – using special software that allows animations
Interpretation and Application
RAW DATA
Year
In-State Tuition
2000
$10,500
2001
$11,550
2002
$12,359
2003
$13,347
2004
$14,548
2005
$14,694
2006
$15,576
2007
$16,822
2008
$17,999
2009
$19,259
2010
$19,644
2011
$20,430
2012
$21,043
2013
$21,043
2014
$21,253
2015
$22,741
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.990842
R Square
0.981769
Adjusted R Square
0.980466
Standard Error
545.8353
Observations
16
In-State Tuition is increasing by roughly $800 per year.
ANOVA
df
Regression
Residual
Total
Intercept
In-State Tuition
1
14
15
SS
2.25E+08
4171107
2.29E+08
Coefficients
-1614630
812.7924
Standard
Error
59426.33
29.60208
MS
2.25E+08
297936.2
t Stat
-27.1703
27.45728
F
Significance F
753.902
1.41E-13
P-value
Lower 95%
Upper 95% Lower 95.0% Upper 95.0%
1.63E-13
-1742087
-1487173
-1742087
-1487173
1.41E-13
749.3023
876.2825
749.3023
876.2825
Interpretation and Application
RAW DATA
Year
In-State Tuition
2000
$10,500
2001
$11,550
2002
$12,359
2003
$13,347
2004
$14,548
2005
$14,694
2006
$15,576
2007
$16,822
2008
$17,999
2009
$19,259
2010
$19,644
2011
$20,430
2012
$21,043
2013
$21,043
2014
$21,253
2015
$22,741
In-State Tuition
$25,000
$20,000
y = 812.79x + 10142
$15,000
$10,000
$5,000
In-State Tuition is increasing by roughly $800 per year.
$0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Enrollment
Interpretation and Application
Enrollment
25000
25000
20000
20000
15000
15000
10000
10000
5000
5000
0
0
2012
Asian
2013
Black
Hispanic
Other
Year
2012
2013
2014
Asian
2014
White
Black
2012
Total
Asian
6570
6688
7466
Hispanic
Black
1273
1289
1386
Hispanic
1341
1428
1533
Other
743
779
974
White
11630
11287
11750
Total
21557
21471
23109
These are technically the correct ways to display this information.
2013
Other
2014
White
Total
Enrollment
Interpretation and Application
25000
20000
15000
10000
5000
0
2012
2013
White
Year
2012
2013
2014
Asian
6570
6688
7466
Black
1273
1289
1386
Asian
Hispanic
Hispanic
1341
1428
1533
2014
Black
Other
Other
743
779
974
White
11630
11287
11750
Total
21557
21471
23109
But if you want to fudge the rules on continuous data a bit, you could display it like this.
http://enrollment.irp.gatech.edu/
Interpretation and Application
RAW DATA
Instruction
Research
Public Service
Academic Support
Institutional Support
Student Services
Other Core Expenditures
Expenditures per FTE
$13,939
$31,454
$2,589
$2,632
$3,300
$1,717
$604
Expenditures per FTE
1%
3%
6%
5%
25%
Instruction
4%
Research
Public Service
Academic Support
Institutional Support
Student Services
Other Core Expenditures
56%
Takeaways
We’ve Learned…
 How to develop a good research question
 How bias can arise in simple comparisons





Self-Selection
Population versus Sample
Spurious Relationships & Omitted Variables
Simultaneity & Reverse Causality
History
We’ve Learned…
 How to develop a good research question
 How bias can arise in simple comparisons
 About experimental designs




Causality
Random Assignment
Treatment & Control Groups
Pre-Test & Post-Test
We’ve Learned…
 How to develop a good research question
 How bias can arise in simple comparisons
 About experimental designs
 About quasi-experimental designs





Regression
Multivariate Regression
Logistic Regression
Propensity Score Matching
Time Series & Longitudinal Analyses
We’ve Learned…
 How to develop a good research question
 How bias can arise in simple comparisons
 About experimental designs
 About quasi-experimental designs
 About research ethics
 About mathematical models







Data Types
Confidence Intervals
t-tests
Designs in Excel
Designs in SAS
Examples
Exercises
We’ve Learned…
 How to develop a good research question
 About mathematical models
 How bias can arise in simple comparisons
 About interpretation and application
 About experimental designs
 About quasi-experimental designs
 About research ethics
 Coefficients
 Statistical & Practical Significance
 Graphs & Tables
Takeaways
 What was the most important takeaway for you?
 What was most helpful?
 What needed additional attention?
 What questions have not been answered?
Additional Resources
Coelli, T. J., Rao, D. S. P., O’Donnell, C. J., & Battese, G. E. (2005). An introduction of efficiency and
productivity analysis (2nd ed.). New York, NY: Springer.
Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating
causal effects: Using experimental and observational designs. Washington, DC: American
Educational Research Association.
Shaddish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental
designs for generalized causal inference (2nd ed.). Boston, MA: Houghton Mifflin.
Wooldridge, J. M. (2009). Introductory econometrics: A modern approach (4th ed.). Mason, OH:
South-Western Cengage Learning.
This one is a short handbook that would be a good, quick reference.
Correlation, Causation, & Evaluation
A PRACTITIONER’S GUIDE TO RESEARCH METHODS
JUSTIN C. SHEPHERD, PH.D.
JUSTIN.SHEPHERD@IRP.GATECH.EDU
Download