Uploaded by Ken

Glossary-BAFPRED

advertisement
GLOSSARY






















REGRESSION ANALYSIS - A statistical procedure used to develop an equation showing how the
variables are related.
DEPENDENT VARIABLE - The variable that is being predicted or explained. It is denoted by y and is
often referred to as the response.
INDEPENDENT VARIABLE(S) - The variable (or variables) used for predicting or explaining values
of the dependent variable. It is denoted by y and is often referred to as the predictor variable…
SIMPLE REGRESSION - Regression analysis involving one dependent variable and one independent
variable.
LINEAR REGRESSION - Regression analysis in which relationships between the independent variables
and the dependent variable are approximated by a straight line.
MULTIPLE REGRESSION - Regression analysis involving one dependent variable and more than one
independent variable.
REGRESSION MODEL - The equation that describes how the dependent variable y is related to an
independent variable r and an error term; the simple linear regression model is y = B, + B, + &, and the
multiple linear regression model is y = Bo + B,*, + Bat, + . . + Ba*, + E.
PARAMETER - A measurable factor that defines a characteristic of a population, process, or system.
RANDOM VARIABLE - The outcome of a random experiment (such as the drawing of a random sample)
and so represents an uncertain outcome.
REGRESSION EQUATION - The equation that describes how the expected value of y for a given value
of x, denoted E(ylx), is related to x. The simple linear regression equation is E(yIx) = B,+B,x and the
multiple linear regression equation is E(yIx) = Bo + Bpti +
ESTIMATED REGRESSION EQUATION - The estimate of the regression equation developed from
sample data by using the least squares method. The estimated simple linear regression equation is ÿ = b,
+ bit, and the estimated multiple linear regression equation is ; = b, + b,t, + bat, + .. + bat,
POINT ESTIMATOR - A single value used as an estimate of the corresponding population parameter.
LEAST SQUARES METHOD - A procedure for using sample data to find the estimated regression
equation.
RESIDUAL - The difference between the observed value of the dependent variable and the value
predicted using the estimated regression equation; for the ith observation, the ith residual is y; EXPERIMENTAL REGION - The range of values for the independent variables x,*2, • * for the data
that are used to estimate the regression model. \
EXTRAPOLATION PREDICTION - of the mean value of the dependent variable y for values of the
independent variables * , X2 .. r that are outside the experimental range.
COEFFICIENT OF DETERMINATION - A measure of the goodness of fit of the estimated regression
equation. It can be interpreted as the proportion of the variability in the dependent variable y that is
explained by the estimated regression equation.
STATISTICAL INFERENCE - The process of making estimates and drawing conclusions about one or
more characteristics of a population (the value of one or more parameters) through analysis of sample data
drawn from the population.
HYPOTHESIS TESTING - The process of making conjecture about the value of a population parameter,
collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence
against the conjecture that is provided by the sample, and using these results to draw a conclusion about
the conjecture.
INTERVAL ESTIMATION - The use of sample data to calculate a range of values that is believed to
include the unknown value of a population parameter.
F TEST - Statistical test based on the F probability distribution that can be used to test the hypothesis that
the values of By By,. B, are all zero; if this hypothesis is rejected, we conclude that there is an overall
regression relationship.
P-VALUE - The probability that a random sample of the same size collected from the same population
using the same procedure will yield stronger evidence against a hypothesis than the evidence in the sample
in the sample data given that the hypothesis is actually true.












T TEST - Statistical test based on the Student's + probability distribution that can be used to test the
hypothesis that a regression parameter B, is zero; if this hypothesis is rejected, we conclude that there is a
regression relationship between the jth independent variable and the dependent variable.
CONFIDENCE INTERVAL - An estimate of a population parameter that provides an interval believed
to contain the value of the parameter at some level of confidence.
CONFIDENCE LEVEL - An indication of how frequently interval estimates based on samples of the
same size taken from the same population using identical sampling techniques will contain the true value
of the parameter we are estimating.
MULTICOLLINEARITY - The degree of correlation among independent variables in a regression.
DUMMY VARIABLE - A variable used to model the effect of categorical independent variables in a
regression model; generally takes only the value zero or one.
QUADRATIC REGRESSION MODEL - Regression model in which a nonlinear relationship between
the independent and dependent variables is fit by including the independent variable and the square of the
independent variable in the model: { = b, + b,«, + by*; also referred to as a second-order polynomial model.
PIECEWISE LINEAR REGRESSION MODEL - Regression model in which one linear relationship
be- tween the independent and dependent variables is fit for values of the independent variable below a
prespecified value of the independent variable, a different linear relationship be- tween the independent
and dependent variables is fit for values of the independent variable above the prespecified value of the
independent variable, and the two regressions have the same estimated value of the dependent variable
(i.e., are joined) at the prespecified value of the independent variable.
KNOT - The prespecified value of the independent variable at which its relationship with the dependent
variable changes in a piecewise linear regression model; also called the breakpoint or the joint.
INTERACTION - The relationship between the dependent variable and one independent variable is
different at different values of a second independent variable.
OVERFITTING - Fitting a model too closely to sample data, resulting in a model that does not accurately
reflect the population.
TRAINING SET - The data set used to build the candidate models
VALIDATION SET - The data set used to compare model forecasts and ultimately pick a model for
predicting values of the dependent variable.
Download