Econ415_out_part3

SIMPLE CLASSICAL LINEAR REGRESSION MODEL STATISTICAL MODEL The next step in an empirical study is to specify a statistical model. The statistical model describes the data generation process. It is supposed to describe the true economic relationship in the population that produced the data. Statistical Model as a Reasonable Approximation The statistical model should be a reasonable approximation of the true unknown data generation process. If the model is not a reasonable approximation, then it is very likely that the conclusions we draw from the data about the economic relationship of interest will be incorrect. SIMPLE CLASSICAL LINEAR REGRESSION MODEL The simple classical linear regression model (SCLRM) is a model that describes an economic data generation process. It is “simple” because it has a dependent variable and one explanatory variable. If we believe the SCLRM is a reasonable approximation of the true data generation process, then we should adopt it for our empirical study; if not, we shouldn’t. SPECIFICATION OF THE MODEL An econometric model is a model that describes the statistical relationship between two or more economic variables. It can be written in general functional form as Yt = ƒ(Xt1, Xt2…Xtk) + μt. To specify an econometric model, we make a set of assumptions about the economic relationship of interest. There are three types of assumptions. 1) An assumption about the variables involved in the relationship. 1) An assumption about the functional form of the relationship. 2) Assumptions about the error term. VARIABLES The SCLRM assumes the economic relationship involves two variables, a dependent variable (Y) and one explanatory variable (X). X is the only important variable that affects Y. All other variables that affect Y are unimportant. These other variables are included in the error term, μ. μ represents the net effect of all factors other than X that affect Y. Given this assumption, we have Yt = ƒ(Xt) + μt. Random and Nonrandom Variables The variables Y and μ are random variables. The variable X can be either a random variable or a nonrandom variable. Random Variable A random variable is a variable whose value is uncertain. The value of a random variable is determined by the outcome of an experiment. Before the experiment takes place, the value that a random variable will take is unknown and cannot be predicted with certainty. A random variable can be discrete, continuous, observable, or unobservable. Nonrandom Variable A nonrandom variable is a variable whose value is known with certainty before an experiment takes place. Marginal, Joint, and Conditional Probability Distributions The behavior of a random variable is described by a probability distribution. A discrete random variable, X, has a discrete probability distribution (X). (X) is a rule that assigns to each value of X one and only one probability weight, such that the sum of the probability weights equal one. A continuous random variable, Y, has a probability density function (Y). (Y) allows you to calculate a probability weight for any interval of Y-values as the area under the graph of (Y) corresponding to this interval. The total area under the graph of (Y) is equal to one. A probability distribution has 3 important characteristics. 1) Mean (expected value). 2) Variance. 3) Standard deviation. The mean of a random variable is a probability weighted average of the values of the random variable. For a discrete random variable, X, the mean is:  = E(X) = Xi(Xi). For a continuous random variable, Y, the mean is:  = E(X) =  Y (Y) dY. The variance of a random variable is a probability weighted average of the squared mean deviations. For a discrete (X) and continuous random variable (Y) the variance is: 2 = Var(X) = E[(X - )2] =  (Xi - )2(Xi), and 2 = Var(Y) = E[(Y - )2] =  (Y - )2 (Y) dY. The standard deviation, , is the square root of the variance. Shorthand notation for the probability distribution of a random variable X is: X ~ (x, x2). This is read, “X has a distribution with mean x, and variance x2. Joint Probability Distribution The behavior of a bivariate or multivariate random variable can be described by a joint probability distribution. A discrete bivariate random variable (X, Y) has a discrete probability distribution xy(X,Y). xy(X,Y) is a rule that assigns to each pair of values of X and Y one and only one probability weight, such that the sum of the probability weights equal one. A continuous bivariate random variable (X, Y) has a continuous probability density function xy(X,Y). xy(X,Y) allows you to calculate a probability weight for any given interval of X and Y-values as the volume under the surface of xy(X,Y) corresponding to this interval. The total volume under the surface of xy(X,Y) is equal to one. A joint probability distribution has 4 important characteristics. 1) mean of X and mean of Y. 2) Variance of X and variance of Y. 3) Standard deviation of X and standard deviation of Y. 4) Covariance of X and Y. Conditional Probability Distribution The behavior of a bivariate or multivariate random variable can also be described by a conditional probability distribution. A discrete bivariate random variable (X, Y) has discrete conditional probability distributions (Y|X) and (X|Y). (Y|X) is a rule that assigns to each value of Y one and only one probability weight for a given value of X, such that the sum of the probability weights equal one. Note that: (Y|X) = xy(X,Y) / x(X). A continuous bivariate random variable (X, Y) has continuous conditional probability density functions (Y|X) and (X|Y). (Y|X) allows you to calculate a probability weight of any interval of Y-values for a given X value as the area under the graph of (Y|X) corresponding to this interval for a given value of X. The total area under the graph of (Y|X) for each value of X is equal to one. A conditional probability distribution has 3 important characteristics. 1) Conditional mean of Y given X, denoted E(Y|X). The conditional variance of Y given X, denoted Var(Y|X). The conditional standard deviation of given X, denoted Sd(Y|X). Conditional Mean Function We want to analyze the statistical relationship between two random variables, X and Y. We want to know if X has an effect on average Y in the population, and if so the direction and size of this effect. Therefore we are interested in the relationship between E(Y|X) and X. This relationship can be expressed as a conditional mean function E(Y|X) = ƒ(X). Possible Violations of Assumption The SCLRM assumes that X is the only important variable that affects Y. This assumption is violated if one or more variables other than X have an important effect on Y. If this assumption is violated, then the SCLR may not be a reasonable approximation of the true data generation process. FUNCTIONAL FORM The SCLRM assumes that the conditional mean function, also called the population regression function, is linear: E(Y|X) = ƒ(X) = α + βX, where α and β are called parameters. A parameter is a quantifiable characteristic of a population. It is always unknown and unobservable. Interpretation of Parameters The parameter α is the vertical intercept of the line. It measures the average value of Y when X = 0. This parameter is usually not of primary interest. The parameter β is the slope parameter. It is given by β = Δ E(Y|X) / ΔX. It measures the change is average Y when X changes by one unit. It is the marginal effect of X on Y. This parameter is of primary interest. The 3 important questions we want to address can be restated as follows (parentheses). 1) Does X effect Y? (Is β zero or nonzero?) 2) What is the direction of the effect? (Is the algebraic sign of β positive or negative?) 3) What is the size of the effect? (What is the magnitude of β?). Meaning of Linearity The SCLRM assumes that the functional form is linear in parameters. This allows us to choose from a wide variety of functional forms such as double-log, exponential, quadratic, etc. Possible Violations of Assumption Two ways the assumption of linear functional form can be violated. 1) The true functional form is nonlinear in parameters. 2) We choose the wrong linear in parameters functional form. If the assumption of linearity is violated, then the SCLR may not be a reasonable approximation of the true data generation process. ERROR TERM The SCLRM assumes that the observed value of Y for the tth unit (Yt) has two components. 1) Systematic component. 2) Random component. The systematic component is represented by the conditional mean function. The random component is represented by the error term. Therefore, the statistical relationship between Yt and Xt is given by Yt = α + βXt + μt. The error term is a random variable that represents the “net effect” of all factors other than Xt that affect Yt for the tth unit in the population. By definition, the error term measures the deviation in Yt from the mean for the tth unit: μt = Yt – (α + βXt). The error term μt is unknown an unobservable. This is because the parameters α and β are unknown and unobservable. Assumptions The error term is an unobservable random variable. We can describe its behavior by a conditional probability distribution ƒ(μ|X). For each value of X there is a probability distribution for μ. The SCLRM makes the following assumptions about the conditional probability distributions of the error term. 1. Error Term Has Mean Zero. 2. Error Term is Uncorrelated with the Explanatory Variable. 3. Error Term Has Constant Variance. 4. Errors Are Independent. 5. Error Term Has a Normal Distribution. E(μt|Xt) = 0 Cov(μt, Xt) = 0 Var(μt|Xt) = σ2 Cov(μt, μs) = 0 μt ~ N Possible Violations of Assumptions Assumptions 1 and 2 may be violated by confounding variables and reverse causation. Assumption 3 is violated if the conditional variance of Y depends on X. Assumption 4 may be violated for nonrandom samples and time series data. If any assumptions about the error term are violated, then the SCLR may not be a reasonable approximation of the true data generation process. RELATIONSHIP BETWEEN ERROR TERM AND DEPENDENT VARIABLE The assumptions about the conditional probability distribution of the error term ƒ(μ|X) imply assumptions about the conditional probability distribution of the dependent variable ƒ(Y|X). This is because Y and μ are random variables, and Y is a linear function of μ. In this case, if we know the properties of μ we can deduce the properties of Y. SIMPLE CLASSICAL LINEAR REGRESSION MODEL CONCISELY STATED The SCLRM can be written concisely in any of the following 4 ways. Yt = α + βXt + μt E(μt|Xt) = 0 Var(μt|Xt) = σ2 Cov(μt, μs) = 0 μt ~ N Yt = α + βXt + μt E(Yt|Xt) = α + βXt Var(Yt|Xt) = σ2 Cov(Yt, Ys) = 0 Yt ~ N Yt = α + βXt + μt μt ~ iid N(0, σ2) Yt ~ iid N(α + βXt, σ2) ESTIMATION The next step in an empirical study is to use the sample data to obtain estimates of the parameters of the statistical model: Yt = α + βXt + μt ; μt ~ iidN(0, σ2). The model has 3 unknown and unobservable parameters: α, β, and σ2. TYPES OF ESTIMATES We can make two types of estimates. 1) Point estimate. 2) Interval estimate. A point estimate is an estimate that is a single number. An interval estimate is an estimate that is an interval of numbers within which we expect to find the true value of the parameter, with some given probability. It is a conclusion in the form of a probability statement. It indicates the degree of certainty or confidence we can have in our point estimate. An interval estimate is also called a confidence interval. ESTIMATOR To obtain estimates of the parameters, we must choose an estimator. An estimator is a rule that tells us how to use the sample data to obtain estimates of a population parameter. Let α, β, and σ2 be population parameters. We will designate the estimators for these parameters with a hat: α^, β^, and σ2^. Accurate and Reliable Estimator We want to choose an estimator that is accurate and reliable. An accurate and reliable estimator will produce an estimate that is close to the true value of the population parameter. Our goal is to get the best possible estimate with our sample data. The best possible estimate is one that comes as close as possible to the true population parameter. How Do We Know if an Estimator is Accurate and Reliable? To determine if an estimator is accurate and reliable, we need to describe the behavior of the estimator from sample to sample. Behavior of an Estimator Let β^ be the estimator of β. We know two things about the behavior of an estimator. 1) For any given sample, the estimate β^ will probably not equal the true value of β. The difference β^– β is estimation error. 2) The estimates will be different for different samples. An Estimator is A Random Variable The estimator β^ is a random variable, because its value is uncertain from sample to sample. Sampling Distribution Because β^ is a random variable, we can describe its behavior by a probability distribution called a sampling distribution. The sampling distribution for β^ can be derived from an experiment or using probability theory. Mean and Variance of Sampling Distribution The sampling distribution of β^ has a mean and a variance (or standard deviation). The mean tells us the average of the estimates from sample to sample. The variance tells us how spread-out the estimates are from sample to sample. Form of Sampling Distribution The sampling distribution of β^ will have some specific form. The form of the sampling distribution illustrates the long-run pattern of β^ from sample to sample. Standard Error of the Estimator The standard deviation of the sampling distribution of β^ is called the standard error of β^. Why? If the mean of the sampling distribution of β^ equals the true value of β, then the standard deviation of β^ tells us the average error of the estimate for a large number of samples. Definition of an Accurate and Reliable Estimator To define an accurate and reliable estimator we use the mean and variance of the sampling distribution of the estimator. We use the mean to define accuracy. We use the variance (standard error) to define reliability. For an estimator to be accurate and reliable, the mean and variance must have specific properties. Small Sample Properties The small sample properties of an estimator are the properties that the mean and variance of the sampling distribution of an estimator must have for any finite sample size. For an estimator to be accurate and reliable, it must have two small sample properties. 1) Unbiasedness (Accuracy). 2) Efficiency (Reliability). Unbiasedness An estimator is accurate if it is unbiased. An estimator is unbiased if the mean of the sampling distribution is equal to the true value of the population parameter being estimated: E(β^) = β. An estimator is inaccurate if it is biased. It is biased if the mean of the sampling distribution does not equal the true value of the parameter. An estimator is biased upward (downward) if the mean is above (below) the true value of the parameter; that is, E(β^) > β and E(β^) < β. A biased estimator systematically overestimates or underestimates the parameter from sample to sample. Efficiency An estimator is most reliable if it is efficient. An estimator is efficient if it has minimum variance in the class of unbiased estimators. If there are two unbiased estimators of β, denoted β1^ and β2^, then β1^ is the efficient estimator if Var(β1^) < Var(β2^). Sampling (Random) Error An efficient estimator has the smallest sampling error of any unbiased estimator. Sampling error is random estimation error that occurs from sample to sample, because different samples have different subsets of units from the population. It is the result of chance. Standard Error of Estimator and Precision An efficient estimator has the smallest standard error of any unbiased estimator, and therefore the smallest average estimation error. Therefore, an efficient estimator will be the most precise estimator. Large Sample Properties Sometimes we don’t know the sampling distribution of an estimator for a finite sample, or we can’t find an estimator that is unbiased and/or efficient in finite samples. In these instances, to define an accurate and reliable estimator we use the mean and variance of the large sample (asymptotic) sampling distribution of the estimator. Large Sample Properties The large sample properties of an estimator are the properties that the mean and variance of the sampling distribution of an estimator must have as the sample size becomes infinitely large. For an estimator to be accurate and reliable, it must have one large sample property: consistency. An estimator is a consistent estimator of a population parameter if the sampling distribution of the estimator collapses to the true value of the population parameter as the sample size becomes infinitely large. A consistent estimator has two properties. 1) If the estimator is biased, as the sample size increases the bias decreases. 2) As the sample size increases, the variance decreases. Thus, the larger the sample size, the smaller the systematic and random estimation error of the estimator. Properties of an Estimator: Conclusions The most accurate and reliable estimator is an estimator that is unbiased, efficient, and consistent. This estimator will have no systematic error and minimum sampling error for any given size sample you might use to obtain your estimate of β. The larger your sample, the smaller the sampling error. Therefore, for a given sample of a given size, it will produce an estimate that is as close as possible to the true unknown value of the population parameter. ESTIMATOR FOR α AND β To obtain estimates of the regression coefficients, α and β, we will use the ordinary least squares (OLS) estimator. The ordinary least squares estimator tell us to choose as our estimates of α and β the numbers that minimize the residual sum of squares for the sample. Residual Sum of Squares Function To calculate the OLS estimates for α and β, we need to find the unknown values, α^ and β^, that minimize a residual sum of squares function for the sample data. The residual for the tth observation is defined as ut = Yt -  - Xt. The residual sum of squares function is given by RSS(, ) = ∑(Yt -  - Xt)2, where RSS denotes residual sum of squares. Because the sample is given, we know the values of Yt and Xt, and therefore they are treated as constants. Because we don’t know the values of and  and  they are treated as unknowns. Deriving the OLS Estimators for  and  To derive the OLS estimators for α and β, we find the values of  and  that minimize the function RSS. This yields, the following system of two equations in two unknowns ∑Yt = n - ∑Xt ∑XtYt = ∑Xt + ∑Xt2 These are called the normal equations. The normal equations are the first-order necessary conditions for minimizing the residual sum of squares function. We can solve these two equations sequentially. Doing so yields the following expressions for  and .  = ∑(Xt – XBAR)(Yt – YBAR) / ∑(Xt – XBAR)2 = Covariation(Y)/Variation(X) = Cov(Y)/Var(X)  = YBAR –   XBAR These are the OLS estimators for α and β for the SCLRM. It is a rule that tells us how to use the sample data to obtain estimates of the unknown population parameters. Are the OLS Estimators for α and β Accurate and Reliable? To determine if the OLS estimators α^ and β^ are accurate and reliable, we need to derive their sampling distributions and analyze their small and large sample properties. Form of the Sampling Distributions The SCLRM assumes that the error term has a normal distribution. Given this assumption, it follows that the sampling distributions of the OLS estimators  and  have normal distributions. Therefore,  ~ N(Mean, Variance) and  ~ N(Mean, Variance). Mean of the Sampling Distributions The SCLRM assumes that that error term has mean zero, and therefore the error term is uncorrelated with the explanatory variable. Given this assumption(s), it follows that the means of the sampling distributions are E() =  and E() = . Therefore,  ~ N(, Variance) and  ~ N(, Variance). Variances, Covariance, and Standard Deviations of the Sampling Distributions The SCLRM also assumes that the error term has constant variance and the errors are independent. Given the assumptions that the error term has mean zero, constant variance, and the errors are independent, it follows that the variances and covariance of the sampling distributions are Var() = 2 / ∑(Xt – XBAR)2 Var() = [(∑Xt2) / n] [2 / ∑(Xt – XBAR)2] Cov(,  ) = - XBAR [2 / ∑(Xt – XBAR)2] Therefore,  ~ N(, 2 / ∑(Xt – XBAR)2) and  ~ N(, [(∑Xt2) / n] [2 / ∑(Xt – XBAR)2]).The square roots of the variances are the standard deviations of the sampling distributions, called the standard errors of the estimators, and denoted s.e.( ) and s.e. (). Variance/Covariance Matrix of Estimates It is standard practice to use the variances and covariances to construct what is called a variancecovariance matrix of estimates. The general form of this matrix is as follows.  Var() Cov =   Cov(,  ) Small Sample Properties Cov(,  )  Var()   Because E() =  and E() = , the OLS estimators are unbiased. It can also be shown that in the class of linear unbiased estimators the OLS estimators have minimum variance, and therefore they are efficient. The small sample properties can be summarized by the GaussMarkov theorem: “Given the assumptions of the classical linear regression model, the OLS estimators  and  are the best linear unbiased estimators of the population parameters  and .” Large Sample Properties It can be shown that the large sample (asymptotic) sampling distributions of the OLS estimators are normal distributions and consistent estimators. Conclusions about OLS Estimators If the assumptions of the CLRM are satisfied, and therefore the CLRM is a reasonable approximation of the true data generation process, then the OLS estimators of α and β are the best estimators. This is because we can’t find an alternative estimator that produces more accurate and reliable estimates than OLS; that is, estimates that will consistently come closer to the true values of the population parameters. However, the following caveats must be noted. Best May Not Be Very Good The OLS estimator will be more reliable and precise than any other estimator. However, this may not be very reliable and precise. The smaller the variation in X and/or the larger the error variance, the bigger the variance of the sampling distributions of α^ and β^, and therefore the less reliable and precise the estimates. Bias in the OLS Estimator The OLS estimator will be biased if the error term does not have mean zero. The error term will not have mean zero if it is correlated with the explanatory variable. The error term will be correlated with the explanatory variable if there are omitted confounding variables, reverse causation, sample selection problems or measurement error in the explanatory variable. ESTIMATOR FOR σ2 To obtain an estimate of the error variance 2, we will use the estimator: 2  = RSS / df = RSS / (n – k). where n is sample size, and k is the number of regression coefficients (2). Standard Error of the Regression The standard error of the regression is the square root of the estimate of the error variance. It measures how spread-out the data points are around the regression line. It is an estimate of the standard deviation of Y after the effect of X has been taken out. ESTIMATOR FOR THE VARIANCE, COVARIANCE, AND STANDARD ERROR OF THE OLS ESTIMATORS σ^ AND β^ To obtain measures of the precision of the point estimates, construct interval estimates, and test hypotheses, we need to obtain estimates of the variances, covariances, and standard errors of the sampling distributions of the OLS estimators σ^ and β^. We will use the following estimators. Var()^ = 2^ / ∑(Xt – XBAR)2 Var()^ = [(∑Xt2) / n] [2^ / ∑(Xt – XBAR)2] Cov(,  )^ = - XBAR [2^ / ∑(Xt – XBAR)2] The estimators for s.e.( ) and s.e.( α^) are the square roots of the estimates of the variances. INTERVAL ESTIMATES of σ^ and β^ There are many different formulas for constructing interval estimates for different estimators. For the OLS estimators σ^ and β^, the general formula for constructing an interval estimate is Point Estimate ± (Critical Value) • (Standard Error of the Estimate) The critical value and the standard error of the estimate are obtained from the sampling distribution of the estimator. To determine the critical value, you choose a degree of confidence. The degree of confidence is the probability that the interval will contain the true value of the population parameter. If the sampling distribution of the estimator is a normal distribution, then the critical value is taken from the standard normal distribution (true standard error) or the tdistribution (estimated standard error). The standard error of the point estimate is the standard deviation of the sampling distribution of the estimator that produced the point estimate. It measures the precision of the point estimate. Margin of Error of the Point Estimate The product of the critical value and the standard error is called the margin of error of the point estimate. It measures the maximum error in the point estimate, with some given probability. This tells us how certain or confident we can be that the point estimate is close to the true value of the population parameter. Rule-of-Thumb for Constructing a 95% Interval Estimate A rule-of-thumb often times used to construct a 95% interval estimate is to use 2 as the critical value, which is the approximate critical value for a 95% degree of confidence from the tdistribution for a sample size of 30 or larger. Rule-of-Thumb 95% OLS Interval Estimates for α and β For the OLS estimator, the rule-of-thumb 95% interval estimates for the regression coefficients α and β are given by β^ ± 2 • s.e.(β^)^ and α^ ± 2 • s.e.(α^)^, where s.e.(β^)^ = √ 2^ / ∑(Xt – XBAR)2, and s.e.(α^)^ = √ [(∑Xt2) / n] [2^ / ∑(Xt – XBAR)2] are the estimated standard errors of the estimates. HYPOTHESIS TESTING The next step in an empirical study is to test hypotheses about the parameters of the statistical model. Hypothesis An hypothesis is an assertion about the value of one or more population parameters. It is expressed as a restriction on the parameters of a statistical model. LOGIC OF HYPOTHESIS TESTING Suppose we want to test the hypothesis β = 0. Choose an estimator β^. Use the sample data to obtain an estimate, β^ = 1.00. Because β^ ≠ 0, should we reject the hypothesis and conclude it’s false? Not necessarily. The estimate will almost always be different from the true value of the population parameter because of random sampling estimation error. To decide if the hypothesis is likely to be true or false, we compare the estimate β^ = 1.00 to the hypothesized value β = 0. If the estimate is reasonably close to the hypothesized value, then we conclude that the difference is likely to be the result of random sampling error (chance), and therefore we accept the hypothesis β = 0. If the estimate is significantly different from the hypothesized value, then we conclude that the difference is likely to be a real difference, and therefore we reject the hypothesis β = 0. How do we determine if the estimate β^ = 1.00 is likely to be reasonably close or significantly different from the hypothesized value β^ = 0? We use the sampling distribution of the estimator β^ to calculate the probability of obtaining an estimate at least as large as β^ = 1.00 if the hypothesis β = 0 is true. If this probability is small, say 5% or less, then we conclude that β^ = 1.00 is significantly different from β = 0 and we reject the hypothesis. If this probability is large, say greater than 5%, then we conclude that β^ = 1.00 is reasonably close to β = 0 and we accept the hypothesis. Null Hypothesis, Type I and II Errors, Level of Significance, Power of Test The hypothesis that we are testing is called the null hypothesis. When testing the null hypothesis, we can make a correct conclusion or an incorrect conclusion. We make a correct conclusion when we: 1) reject the null hypothesis when it is false, 2) accept the null hypothesis when it is true. We make an incorrect conclusion when we 1) reject the null hypothesis when it is true, 2) accept the null hypothesis when it is false. Thus, when testing an hypothesis we can make two errors. A type I error occurs when we reject the null hypothesis when it is true. A type II error occurs when we accept the null hypothesis when it is false. The probability of making a type I error is called the level of significance of the test. The level of significance of the test is also called the size of the test. The power of the test is the probability of rejecting the null hypothesis when it is false. It is given by one minus the probability of a type II error. PROCEDURES FOR TESTING HYPOTHESES There are 3 alternative approaches that can be used to test an hypothesis. 1) Level of significance approach. 2) Confidence interval approach. 3) P-value approach. In this class, we will use the level of significance approach. LEVEL OF SIGNIFICANCE APPROACH There are 5 basic steps involved in the level of significance approach. 1. Specify the null and alternative hypotheses. 2. Derive a test statistic and the sampling distribution of the test statistic under the null hypothesis. 3. Choose a level of significance and find the critical value(s) for the test statistic. 4. Use the sample data to calculate the actual value of the test statistic. 5. Compare the actual (calculated) value of the test statistic to the critical value and accept or reject the null hypothesis. Null and Alternative Hypotheses The first step in testing an hypothesis is to state the null and alternative hypotheses. The null hypothesis is as assertion about the value(s) of one or more parameters. The test is designed to determine the “strength of evidence” against the null hypothesis. We usually hope to reject the null. The alternative hypothesis is the hypothesis we usually hope is true. We usually hope to accept the alternative. The most often tested null hypothesis is the hypothesis that one variable, X, has no effect on some other variable Y. The alternative hypothesis is that X has an effect on Y. We hope to reject the null hypothesis of “no effect” and accept the alternative hypothesis that there is an effect. The null and alternative hypotheses are written as Ho: β = 0 and H1: β ≠ 0. This is called a two-sided or two-tailed test because the alternative hypothesis allows β to be greater than or less than zero. If the alternative hypothesis was written as either β > 0 or β < 0, this would be a one-sided or one-tailed test. Derive a Test Statistic and the Sampling Distribution of the Test Statistic Under the Null Hypothesis The second step in testing an hypothesis is to derive a test statistic and the sampling distribution of the test statistic under the null hypothesis. Like an estimator, a test statistic is a random variable whose value depends on the sample data, varies from sample to sample, and can be described by a sampling distribution. When we choose the t-statistic as our test statistic, we are using what is called the t-test. When testing a hypothesis about the value of a single parameter in the SCLRM, the appropriate test is the t-test. The t-statistic and its sampling distribution are: t = (β^ – β) / s.e.( β^)^ ~ t(n - k), where β^ is the estimate, β is the hypothesized value, and s.e.( β^) is the standard error of the estimate. The t-statistic has a t-distribution with n – k degrees of freedom, where k is the number of regression coefficients in the model. Choose the Level of Significance of the Test and Find the Critical Value(s) of the Test Statistic The third step in testing a hypothesis is to choose the level of significance of the test and find the critical value(s) of the test statistic. The level of significance of the test is the probability of making a type I error; that is, the probability of rejecting the null hypothesis when it its true. Use the Sample Data to Calculate the Actual Value of the Test Statistic The fourth step in testing an hypothesis is to use the sample data and the estimator to calculate the actual value of the test statistic. Compare the Actual Value of the Test Statistic to the Critical Value The fifth step in testing an hypothesis is to compare the actual value of the test statistic, t, to the critical value(s) of the test statistic, t*. If the actual value is greater than or equal to (less than ) the absolute value of the critical value, then reject (accept) the null hypothesis. If we reject (accept) the null hypothesis we say that the estimate is statistically significant (insignificant) at the {critical value}% level. P-VALUE AS A MEASURE OF STRENGTH OF EVIDENCE AGAINST THE NULL HYPOTHESIS The P-value is the probability of obtaining a parameter estimate at least as large as the one produced by an estimator, if the null hypothesis is true. The p-value can be interpreted as a measure of the strength of evidence against the null hypothesis, and for the alternative hypothesis. The smaller (larger) the p-value, the stronger (weaker) the evidence against the null and for the alternative hypothesis. DOES X HAVE AN EFFECT ON Y?: TWO APPROACHES TO ANSWERING THIS QUESTION If our objective is to explain an economic relationship, the first question we ask is: “Does X have an effect or Y?” Two approaches can be used to answer this question. 1) Level of significance hypothesis test approach. 2) P-value strength of evidence approach. Level of Significance Hypothesis Test Approach To answer this question “yes or no” we must use the level of significance approach. This approach requires us to choose a specific level of significance. Choosing a level of significance is somewhat arbitrary. It depends upon how willing we are to make the incorrect conclusion. If we want very badly to avoid a Type I error, we will want to choose a level of significance of 0.01 or less. If we want very badly to avoid a Type II error, we will want to choose a level of significance of 0.05 or higher. P-Value Strength of Evidence Approach Rather than answering this question yes or no, we might simply provide a measure of the strength of evidence for an effect. The smaller (larger) the p-value, the stronger (weaker) the evidence for an effect. Alternatively, the smaller (larger) the p-value, the smaller (larger) the probability the observed effect is the result of chance, and the larger (smaller) the probability the observed effect is a real effect. PREDICTION AND GOODNESS OF FIT To use X to predict Y, we use the sample regression function. The sample regression function is the regression function with the estimated values of the parameters. We then substitute a value of X into the sample regression function and calculate the corresponding value of Y. This value of Y is the predicted value of Y. Measures of Goodness of fit If our objective is to use X to predict Y, then we should measure the goodness of fit of the model. Goodness-of-fit refers to how well the model (regression line) fits the sample data. The better the model fits the data, the higher the predictive validity of the model, and therefore the better values of X should predict values of Y. The two most often used measures of goodness of fit for the SCLRM are the standard error of the regression, and the R2 statistic. Standard error of the regression The standard error of the regression (SER) is given by the square root of the estimated error variance, SER = √σ2^ = √RSS/(n – k). It measures how far a typical Y-value differs from the predicted value of Y given by the regression line. The smaller (larger) the standard error of the regression, the better (worse) the model fits the data, and therefore the higher (lower) the predictive validity of the model. R2 Statistic The R2 measures the proportion of the variation in the dependent variable that is explained by the variation in the explanatory variable. It can take any value between 0 and 1. If the R2 is equal to one, all of the data points lie on the sample regression line, and therefore the explanatory variable explains all of the variation in the dependent variable. If the R2 statistic is equal to zero the data points are highly scattered around the regression line, which is a horizontal line, and therefore the explanatory variable explains none of the variation in the dependent variable. The closer R2 to 1, the more tightly the data points are clustered around the regression line, and therefore the larger the proportion of the variation in Y explained by X. For example, if the R2 statistic is 0.40 this means that X explains 40% of the variation in Y. The remaining 60% of the variation is Y is explained by factors other than X that affect Y, summarized by the error term. The larger (smaller) the R2 statistic, the better (worse) the model fits the data, and therefore the higher (lower) the predictive validity of the model. The R2 statistic is calculated using either of the following formulas: R2 = RSS / TSS or R2 = 1 – (RSS / TSS), where RSS is the residual sum of squares, and TSS is the total sum-of-squares for Y. DRAWING CONCLUSIONS FROM THE STUDY The final step of an empirical study has two parts. 1) Make conclusions about the economic relationship. 2) Assess the validity of the conclusions CONCLUSIONS If the objective of the study is to explain the relationship between X and Y, then our conclusions should answer the following questions. Does X effect Y? If so what is the direction and size of the effect? What mechanism produces the effect? VALIDITY OF CONCLUSIONS To assess the validity of the conclusions we use two criteria. 1) Internal validity. 2) External validity. INTERNAL VALIDITY An empirical study is internally valid if the conclusions about the independent causal effect of X on Y are valid for the population being studied. Criteria for Internal Validity To assess internal validity, we must address two questions. 1. Is the estimate of the effect of X on Y unbiased? 2. Is the standard error of the estimate of the effect of X on Y correct? If the estimate is biased, then it is systematically too high or low, and we can’t have much confidence in our conclusions. If the standard error is incorrect, then the t-test, p-value, margin of error, and confidence interval are incorrect, and therefore our conclusions may be incorrect. UNBIASEDNESS OF ESTIMATE The most important potential sources of bias are confounding variables and reverse causation. Confounding Variables A variable Z is a confounding variable if the effect of Z on Y cannot be separated from the effect of X on Y. Two conditions are necessary for Z to be a confounding variable. 1) Z has an effect on Y. 2) Z is correlated with X. There are two types of confounding variables. 1) Observable. 2) Unobservable. Z is observable if we have data for it. Z is unobservable if we don’t have data for it. If we don’t control for confounding variables, then our estimate of the effect of X on Y will be biased in small samples, and inconsistent in large samples. This is called omitted variable bias. Identification of Potential Confounding Variables To identify potential confounding variables, we can use economic theory, past studies, experience, intuition, or a formal statistical test. Controlling for Confounding Variables There are 4 methods to control for confounding variables. 1) Specify a multiple classical linear regression model (MCLRM) and include the confounding variable(s) as explanatory variables in the model. This method can be used only if the confounding variable(s) are observable. If the confounding variable(s) is unobservable, then one of the following 3 methods can be used. 2) Collect panel data and specify an error components statistical model. 3) Use an instrumental variables estimator. This requires data on a variable(s) correlated with X but not correlated with μ; that is, a variable(s) that does not have a direct effect on Y, but has an indirect effect on Y by affecting X. 4) Conduct a randomized controlled experiment. Reverse Causation Reverse causality exists if Y causes X, and therefore the effect of Y on X cannot be separated from the effect of X on Y. Because the OLS estimator β^ picks up both effects, it is biased and inconsistent. Reverse causality produces a correlation between X and μ. Why? If X causes Y, and Y causes X then we have two equations, Y = α + βX + μ and X = γ + θY + ν. When μ changes Y changes. When Y changes X changes. Therefore, μ is correlated with X. Because Corr(μ, X) ≠ 0, the error term does not have mean zero. We know that if the error term does not have mean zero, then the OLS estimator is biased. This is called simultaneous equations bias. Identification of Reverse Causation To identify reverse causation, we can use economic theory, past studies, experience, intuition, or a formal statistical test. Controlling for Reverse Causation There are 2 methods to control for reverse causality. 1) Use an instrumental variables estimator. 2) Conduct a randomized controlled experiment. CORRECT STANDARD ERROR The two most important sources of incorrect standard errors are the following. 1) Heteroskedasticity. 2) Autocorrelation. Heteroskedasticity Heteroskedasticity exists when the error term has non-constant variance. This violates an assumption of the SCLRM that is required to obtain the correct formula for the standard error. This causes the estimated standard error of the OLS estimate to be incorrect. As a result, the ttest, p-value, margin of error, and confidence interval are incorrect. Identifying Heteroskedasticity If the error term has constant variance, then the conditional variance of Y is the same for all values of X. Theory, past studies, experience, intuition, or a formal statistical test can be used to assess whether the conditional variance of Y depends upon X. Obtaining the Correct Standard Error To obtain the correct standard error, we can calculate a heteroskedasticity-robust standard error, using Halbert White’s method. Most statistical software packages will do this for us upon request. Autocorrelation Autocorrelation exists when the errors for different units in the population are correlated with one another. This violates an assumption of the SCLRM that is required to obtain the correct formula for the standard error. This causes the estimated standard error of the OLS estimate to be incorrect. As a result, the t-test, p-value, margin of error, and confidence interval are incorrect. Identifying Autocorrelation Autocorrelation is most likely to exist when we use time series data. EXTERNAL VALIDITY An empirical study is externally valid if the conclusions about the independent causal effect of X on Y can be generalized from the population and setting studied to the population and setting of interest. An empirical study is externally valid if there are not big differences between the population and setting being studied, and the population and setting of interest.

Econ415_out_part3

Related documents

Products

Support

Econ415_out_part3

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib