Linear Regression Assignment Jose Fernandez MSBA Data Analytics III In this homework Assignment you will use linear regression to study speeding tickets. Each question builds on the previous question. Your regressions should have more controls as you move through the assignment. Try to capture all of these regression in one nicely formatted table. What determines how much drivers are fined if they are stopped for speeding? Do demographics like age, gender, and race matter? To answer this question, we’ll investigate traffic stops and citations in Massachusetts using data from Makowsky and Stratmann (2009). Even though state law sets a formula for tickets based on how fast a person was driving, polic officers in practice often deviate from the formula. An amount for the fine is given only for observations in which the police officer decided to assess a fine. a) Plot a histogram of fines. Does it looked normally distributed or skewed? b) Estimate a simple linear regression model in which the ticket amount is the dependent variable as a function of age. Is age statistically significant? c) What does it mean for a variable to be endogenous? Is it possibly age endogenous? Please explain your answer. d) Estimate the model from part b), also controlling for miles per hour over the speed limit. Explain what happens to the coefficient on age and why. e) Is the effect of age on fines linear or non-linear? Assess this question by estimating a model with a quadratic age term, controlling for MPHover, Female, Black, and Hispanic. Interpret the coefficients on the age variables. f) Sketch the relationship between age and ticket amount from the foregoing quadratic model: calculate the fitted value for a white male with 0 MPHover (probably not many people going zero miles over the speed limit got a ticket, but this simplifies calculations a lot) for ages equal to 20, 25, 30, 35, 40, and 70. Use R to calculate these values and plot them. g) Calculate the age that is associated with the lowest predicted fines. Hint: You can use calculus or a simple formula used to find the minimum and maximum of quadratic functions. h) Do drivers from out of town and out of state get treated differently? Do state police and local police treat nonlocals differently? Estimate a model that allows us to assess whether out of towners and out of staters are treated differently and whether state police respond differently to out of towners and out of staters. Interpret the coefficients on the relevant variables. Hint: you have to do something more than just including the dummy variables. i) Test whether the two state police interaction terms are jointly significant. Briefly explain your results. Hint: it says jointly so it is not a T-test. 1 Variable Name Description MPHover Amount Age Female Black Hispanic State Pol Miles per hour over the speed limit Assessed fine for the ticket Age of driver Equals 1 for women and 0 for men Equals 1 for African-American and 0 otherwise Equals 1 for Hispanics and 0 otherwise Equals 1 if ticketing officer was state patrol officer and 0 otherwise Equals 1 if driver from out of town and 0 otherwise Equals 1 if driver from out of state and 0 otherwise OutTown OutState 2