GLOSSARY REGRESSION ANALYSIS - A statistical procedure used to develop an equation showing how the variables are related. DEPENDENT VARIABLE - The variable that is being predicted or explained. It is denoted by y and is often referred to as the response. INDEPENDENT VARIABLE(S) - The variable (or variables) used for predicting or explaining values of the dependent variable. It is denoted by y and is often referred to as the predictor variable… SIMPLE REGRESSION - Regression analysis involving one dependent variable and one independent variable. LINEAR REGRESSION - Regression analysis in which relationships between the independent variables and the dependent variable are approximated by a straight line. MULTIPLE REGRESSION - Regression analysis involving one dependent variable and more than one independent variable. REGRESSION MODEL - The equation that describes how the dependent variable y is related to an independent variable r and an error term; the simple linear regression model is y = B, + B, + &, and the multiple linear regression model is y = Bo + B,*, + Bat, + . . + Ba*, + E. PARAMETER - A measurable factor that defines a characteristic of a population, process, or system. RANDOM VARIABLE - The outcome of a random experiment (such as the drawing of a random sample) and so represents an uncertain outcome. REGRESSION EQUATION - The equation that describes how the expected value of y for a given value of x, denoted E(ylx), is related to x. The simple linear regression equation is E(yIx) = B,+B,x and the multiple linear regression equation is E(yIx) = Bo + Bpti + ESTIMATED REGRESSION EQUATION - The estimate of the regression equation developed from sample data by using the least squares method. The estimated simple linear regression equation is ÿ = b, + bit, and the estimated multiple linear regression equation is ; = b, + b,t, + bat, + .. + bat, POINT ESTIMATOR - A single value used as an estimate of the corresponding population parameter. LEAST SQUARES METHOD - A procedure for using sample data to find the estimated regression equation. RESIDUAL - The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation; for the ith observation, the ith residual is y; EXPERIMENTAL REGION - The range of values for the independent variables x,*2, • * for the data that are used to estimate the regression model. \ EXTRAPOLATION PREDICTION - of the mean value of the dependent variable y for values of the independent variables * , X2 .. r that are outside the experimental range. COEFFICIENT OF DETERMINATION - A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation. STATISTICAL INFERENCE - The process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through analysis of sample data drawn from the population. HYPOTHESIS TESTING - The process of making conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture. INTERVAL ESTIMATION - The use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter. F TEST - Statistical test based on the F probability distribution that can be used to test the hypothesis that the values of By By,. B, are all zero; if this hypothesis is rejected, we conclude that there is an overall regression relationship. P-VALUE - The probability that a random sample of the same size collected from the same population using the same procedure will yield stronger evidence against a hypothesis than the evidence in the sample in the sample data given that the hypothesis is actually true. T TEST - Statistical test based on the Student's + probability distribution that can be used to test the hypothesis that a regression parameter B, is zero; if this hypothesis is rejected, we conclude that there is a regression relationship between the jth independent variable and the dependent variable. CONFIDENCE INTERVAL - An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence. CONFIDENCE LEVEL - An indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating. MULTICOLLINEARITY - The degree of correlation among independent variables in a regression. DUMMY VARIABLE - A variable used to model the effect of categorical independent variables in a regression model; generally takes only the value zero or one. QUADRATIC REGRESSION MODEL - Regression model in which a nonlinear relationship between the independent and dependent variables is fit by including the independent variable and the square of the independent variable in the model: { = b, + b,«, + by*; also referred to as a second-order polynomial model. PIECEWISE LINEAR REGRESSION MODEL - Regression model in which one linear relationship be- tween the independent and dependent variables is fit for values of the independent variable below a prespecified value of the independent variable, a different linear relationship be- tween the independent and dependent variables is fit for values of the independent variable above the prespecified value of the independent variable, and the two regressions have the same estimated value of the dependent variable (i.e., are joined) at the prespecified value of the independent variable. KNOT - The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model; also called the breakpoint or the joint. INTERACTION - The relationship between the dependent variable and one independent variable is different at different values of a second independent variable. OVERFITTING - Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population. TRAINING SET - The data set used to build the candidate models VALIDATION SET - The data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable.