Curvilinear Bivariate Regression

Curvilinear Bivariate Regression
You are now familiar with linear bivariate regression analysis. What do you do if the
relationship between X and Y is curvilinear? It may be possible to get a good analysis with our usual
techniques if we first “straighten-up” the relationship with data transformations.
You may have a theory or model that indicates the nature of the nonlinear effect. For example,
if you had data relating the physical intensity of some stimulus to the psychologically perceived
intensity of the stimulus, Fechner’s law would suggest a logarithmic function (Stevens’ would suggest
a power function). To straighten out this log function all you would need to do is take the log of the
physical intensity scores and then complete the regression analysis using transformed physical
intensity scores to predict psychological intensity scores. For another example, suppose you have
monthly sales data for each of 25 consecutive months of a new business. You remember having
been taught about exponential growth curves in a business or a biology class, so you do the
regression analysis for predicting the log of monthly sales from the number of months the firm has
been in business.
In other cases you will have no such model, you simply discover (from the scatter plot) that the
relationship is curvilinear. Here are some suggestions for straightening up the line, assuming that the
relationship is monotonic.
A. If the curve for predicting Y from X is a negatively accelerated curve, a “curve of decreasing
returns,” one where the positive slope decreases as X increases, try transforming X with the
1
following: X , LOG(X),
. Prepare a scatter plot for each of these and choose the one that best
X
straightens the line (and best assists in meeting the assumptions of any inferential statistics you are
doing).
B. If the curve for predicting Y from X is a positively accelerated curve, one where the positive
1
slope increases as X increases, try: Y , LOG(Y),
.
Y
C. One of my old Minitab manuals suggest that with a curve like this (negative, decelerated),
one should try transforming X, Y, or both X and Y with some nonlinear transformation(s) such as LOG
or SQRT.

Copyright 2014, Karl L. Wuensch, All Rights Reserved
Curvi.docx
You can always just try a variety of nonlinear transformations and see what works best. One
handy transformation is to RANK the data. When done on both variables, the resulting r is a
Spearman correlation coefficient.
To do a square root transformation of variable Y in SAS, use a statement like this in the data
step: X_Sqrt = SQRT(X); for a base ten log transformation, X_Log = LOG10(X); for an inverse
transformation, X_Inv = -1/X; . If you have scores of 0 or less, you will need to add an appropriate
constant (to avoid scores of 0 or less) to X before applying a square root or log transformation, for
example, X_Log = LOG10(X + 19); .
Please look at this example of the use of a log transformation where the relationship is
nonlinear, negative, monotonic.
Polynomial Regression
Monotonic nonlinear transformations (such as SQRT, LOG, and -1/X) of independent and/or
dependent variables may allow you to obtain a predictive model that has less error than does a linear
model, but if the relationship between X and Y is not monotonic, a polynomial regression may do a
better job. A polynomial has two or more terms. The polynomials we most often use in simple
polynomial regression are the quadratic, Yˆ  a  b1 X  b2 X 2 , and the cubic,
Yˆ  a  b X  b X 2  b X 3 . With a quadratic, the slope for predicting Y from X changes direction
1
2
3
once, with a cubic it changes direction twice.
Please run the program Curvi.sas from my SAS Programs page. This provides an example of
how to do a polynomial regression with SAS. The data were obtained from scatterplots in an article
by N. H. Copp (Animal Behavior, 31,: 424-430). Ladybugs tend to form large winter aggregations,
clinging to one another in large clumps, perhaps to stay warm. In the laboratory, Copp observed, at
various temperatures, how many beetles (in groups of 100) were free (not aggregated). For each
group tested, we have the temperature at which they were tested and the number of ladybugs that
were free. Note that in the data step I create the powers of the temperature variable (temp2, temp3,
and temp4).
Please note that a polynomial regression analysis is a sequential analysis. One first
evaluates a linear model. Then one adds a quadratic term and decides whether or not addition of
such a term is justified. Then one adds a cubic term and decides whether or not such an addition is
justified, etc.
Proc Reg is used to test linear, quadratic, cubic, and quartic models. The VAR statement is
used to list all of the variables that will be used in the models that are specified.
The LINEAR model replicates the analysis which Copp reported. Note that there is a strong (r2
= .615) and significant (t = 7.79, p < .001) linear relationship between temperature and number of free
ladybugs. Inspection of the residuals plots, however, shows that I should have included temperaturesquared in the model, making it a quadratic model. After contemplating how complex I am willing to
make the model, I decide to evaluate linear, quadratic, cubic, and quartic models.
I next evaluated the QUARTIC model, requesting Type I (sequential) sums of squares and
Type I squared semipartial correlation coefficients. From the output I can see that adding
temperature-squared to the linear model significantly increases the R2, and by a large amount, .223.
Adding temperature-cubed to the quadratic model significantly increases the R2, but by a small
amount, .023. I ponder whether that small increase in R2 justifies making the model more complex. I
end, somewhat reluctantly, keeping temperature-cubed in the model. Finally, I see that adding
2
temperature4 to the cubic model increased the R2 by a small and not significant amount, so I revert to
the cubic model
For pedagogical purposes, I created plots of the linear model, the quadratic model, and the
cubic model.
The plot for the quadratic model shows that aggregation of the ladybugs is greatest at about 5
to 10 degrees Celsius (the mid to upper 40’s Fahrenheit). When it gets warmer than that, the
ladybugs start dispersing, but they also start dispersing when it gets cooler than that. Perhaps
ladybugs are threatened by temperatures below freezing, so the dispersal at the coldest temperatures
represents their attempt to find a warmer place to aggregate.
The second bend in the curve provided by a cubic model is not very apparent in the plot of the
cubic model, but there is an apparent flattening of the line at low temperatures. It would be really
interesting to see what would happen if the ladybugs were tested at temperatures even lower than
those employed by Copp.
Below is an example of how to present results of a polynomial regression. I used SPSS to
produce the figure.
Forty groups of ladybugs (100 ladybugs per group) were tested at temperatures ranging
from -2 to 34 degrees Celsius. In each group I counted the number of ladybugs which were free (not
aggregated). A polynomial regression analysis was employed to fit the data with an appropriate
model. To be retained in the final model, a component had to be statistically significant at the .05
level and account for at least 2% of the variance in the number of free ladybugs. The model adopted
was a cubic model, Free Ladybugs = 13.607 + .085 Temperature - .022 Temperature2 + .001
Temperature3, F(3, 36) = 74.50, p < .001, η2 = .86, 90% CI [.77, .89]. Table 1 shows the contribution
of each component at the point where it entered the model. It should be noted that a quadratic model
fit the data nearly as well as did the cubic model.
Table 1.
Number of Free Ladybugs Related to Temperature
Component
SS
df
t
p
sr2
Linear
853
1
7.79
< .001
.61
Quadratic
310
1
7.15
< .001
.22
Cubic
32
1
2.43
.020
.02
As shown in Figure 1, the ladybugs were most aggregated at temperatures of 18 degrees or
less. As temperatures increased beyond 18 degrees, there was a rapid rise in the number of free
ladybugs.
3
Current research in my laboratory is directed towards evaluating the response of ladybugs
tested at temperatures lower than those employed in the currently reported research. It is anticipated
that the ladybugs will break free of aggregations as temperatures fall below freezing, since remaining
in such a cold location could kill a ladybug.
Polynomial Regression with I/O Data
Megan Waggy (2014) investigated the relationships between employee commitment and
demographic variables. One of the commitment variables was continuance commitment, the
employee’s ‘need’ to stay with the organization. The employee evaluates the costs associated with
leaving the organization and if the costs are too high, the employee feels unable to leave and is found
to have high levels of continuance commitment. The costs include the personal sacrifice associated
with leaving the organization, such as loss of salary, loss of friends, and loss of job progress. These
costs may be moderated by the availability of alternative employment.
Megan’s review of the literature revealed that studies of the relationship between
organizational commitment and tenure have been mixed – sometimes positive, sometimes negative,
sometimes trivial. It occurred to me that this might result from the relationship not being linear. My
4
thinking was that by mid-career an employee might have developed the skills and connection that
make em able to get alternative employment elsewhere and that as an employee approaches
retirement the need for continued employment will drop. A re-analysis of Megan’s data confirmed the
hypothesized nonlinear relationship.
We used SPSS to conduct the analysis.
Click Next and select Tenure squared (created with Transform, Compute). Then click Next
and add Tenure cubed.
5
Model
1
2
3
R
R
Square
.045a
.261b
.288c
Change Statistics
R Square
F
df1
df2
Change
Change
.002
.352
1
174
.066
12.241
1
173
.015
2.775
1
172
.002
.068
.083
Sig. F
Change
.554
.001
.098
Notice that adding tenure cubed increased the R2 by a small and insignificant amount.
Accordingly, we drop back to the quadratic model. Also notice that the strength of association is a
helluva lot lower than it was with ladybugs. Well, explaining the behavior of humans is a bit more
difficult than explaining the behavior of ladybugs. 
ANOVAa
Model
Sum of
Squares
Regression
df
Mean
Square
4.096
.649
6.308
.002c
3.325
5.173
.002d
.243
.691
120.283
174
2
Total
Regression
Residual
Total
Regression
120.526
8.192
112.334
120.526
9.975
175
2
173
175
3
Residual
110.551
172
Total
120.526
175
3
.554b
1
Residual
Sig.
.352
.243
1
F
.643
a. Dependent Variable: Cont_Comm
b. Predictors: (Constant), Tenure_Years
c. Predictors: (Constant), Tenure_Years, Tenure2
d. Predictors: (Constant), Tenure_Years, Tenure2, Tenure3
Presenting the Results
Sequential polynomial regression analysis was employed to investigate the nature of the
relationship between tenure and continuance commitment. After evaluating a linear model, each
additional step involved entering the next highest power of the predictor (tenure, in years). This
continued until the addition of the next highest power increased the fit of the model to the data by an
insignificant or otherwise trivial amount. As shown in Table 1, adding a quadratic component to the
model produced a significant increase in fit, but adding a cubic component did not. Accordingly, the
quadratic model was adopted, F(2, 173) = 6.308, p = .002, R2 = .068 (see Figure 1).
6
Table 1
Predicting Continuance Commitment from Years of Tenure
df
p
Step
R2
F for R2
1: Linear
.002
0.352
1, 174 .554
2: Quadratic .066
12.241
1, 173 .001
3: Cubic
.015
2.775
1, 172 .098
Figure 1
Relationship Between Tenure and Continuance Commitment
References
Waggy, M. R. (2014). Self-reported changes in organizational commitment: the relationship between
present organizational commitment and its perceived changes over time. Not yet published
masters thesis, East Carolina University.
Back to Wuensch’s Stats Lessons Page
The ladybugs data in an Excel spreadsheet

Copyright 2014, Karl L. Wuensch, All Rights Reserved
7