Reference Document (Regression Analysis-

advertisement
Reference Document
Regression Analysis—Instructional Resource for Cost/Managerial
Accounting
David E. Stout
Lariccia School of Accounting & Finance
Youngstown State University
August 5, 2014
Please Do Note Quote Without Permission of the Author
1. Introduction
This document serves as a reference guide (or tool) for using the author’s “Regression
Analysis—Instructional Resource.” As explained in the companion journal article, the resource
is divided into two primary parts:

Part One deals with the use of regression analysis to estimate simple linear cost functions,
the use of Excel for estimating these functions, interpretation of regression-related output
associated with cost estimation, and alternatives for estimating costs based on a
regression model fit to a set of data. This portion of the resource consists of the following
four files:
(1) a set of PowerPoint slides (“Estimating Linear Cost Functions”) that provides an
overview of simple (one-variable) cost functions and OLS regression analysis;
(2) an Excel file (“Estimating Linear Cost Functions Using Excel”) that discusses five
Excel-based methods that can be used to estimate a simple linear cost function;
(3) a Word file (“Cost Estimation and Statistical Issues—Regression Analysis”) that
addresses three separate cost-estimation and statistical issues (five options in Excel
for generating cost estimates after a regression analysis has been performed; an
analysis of changes in the standard error of the regression, SE, as sample size, n,
changes; and, constructing confidence intervals around point estimates); and,
(4) an Excel file (“Change in SE as n increases”) that can be used in conjunction with
item (3) above.

Part Two deals with estimating one form of non-linear cost function: the incremental
unit-time learning-curve model. This portion of the instructional resource consists of the
following three files:
(1) a PowerPoint file (“Estimating Learning-Curve Cost Functions”), which provides a
review of logarithms and a discussion of common forms of learning-curve models;
(2) a Word file (“Example—Estimating a Learning-Curve Function”), which provides a
discussion of two procedures that can be used within Excel to estimate a learningcurve model; and
(3) an Excel file (“Learning-Curve Analysis [“Incremental Unit-Time Model]), which
provides a worked example of using Excel to fit a learning-curve model to a set of
data and a basis for discussing the interpretation and use of the estimated coefficients
in this model.
A more detailed explanation of the above seven files is provided below.
Page 1 of 11
2. Part One: Estimating Simple Linear (One-Variable) Cost Functions
As indicated above, this portion of the resource package consists of four files: one set of
PowerPoint slides, two Excel files, and one Word document.
2.1. PowerPoint slides
The PowerPoint deck (“Estimating Linear Cost Functions”) consists of 14 numbered
slides. Slides 2 through 4 provide a broad overview of regression analysis, as applied to the task
of estimating simple (i.e., one-variable) linear cost functions. Slides 5 through 10 provide an
overview of calculating and interpreting both the standard error of the regression (SE) and the
coefficient of determination, R2.1,2 In discussing these slides with students, I relate the discussion
to two statistical measures they should have learned from statistics: the mean (as a measure of
central tendency) and the standard deviation (as a measure of variability). Specifically, I relate
the determination of R2 and SE to these two points. This is an initial attempt to relate (or anchor)
the topic at hand to something students should have already studied. In my experience, the
context of cost estimation (a business-related topic) helps to “demystify” the discussion. In terms
of the mean, the point can be made to students that one option is to use the mean value of the
data set as an estimate of the dependent variable (cost), regardless of the value assumed by the
cost driver (X). Nothing prohibits the cost analyst from doing this! The R2 value can then be
interpreted as the average (percentage) increase in accuracy when using the regression equation
to estimate values of Y (within the set of data) rather than the mean value, ̅
Y. It is then possible
to indicate that, as learned from statistics, the standard deviation (or variance) for the data set at
hand is a measure of the variability (or dispersion) of the actual Y values around the mean value,
̅
Y. In similar fashion, the standard error of the regression (SE) represents the dispersion of the
actual Y values, not around the mean value of Y but around the OLS regression line.
With this as background regarding the notion of variance (dispersion), the discussion can
then turn to the formula for calculating R2.3 At this stage, it is useful to make the point that OLS
regression receives its name by virtue of how it determines the “line of best fit” through any set
1
Slides 9 and 10 refer, respectively, to the SE and R2 associated with a regression analysis of the data set contained
in the Excel file “Estimating Linear Cost Functions.” As such, these two slides are linked to the related Excel file.
2
Students can access either of the following sources for an explanation of the coefficient of determination:
http://www.ehow.com/how_8241563_calculate-squared-regression.html or
http://www.ehow.com/how_5148712_calculate-r.html. Use of the RSQ built-in function in Excel to estimate the
coefficient of determination is available at http://www.ehow.com/how_8498030_calculate-r2-excel.html.
It is also appropriate here to anticipate the discussion of the output of Excel’s REGRESSION routine in the form of
an ANOVA table. That is to say, I first attempt to ensure that students understand the notion of decomposing the
total variability of Y into explained and unexplained portions and that while I present to them the formulas for
calculating both the SE and R2, the calculations are done effortlessly using Excel. Put another way, I try to ensure a
conceptual understanding of the mechanics of OLS regression before transitioning to Excel and a reinforcement of
these concepts through an actual regression analysis that students can perform.
3
Page 2 of 11
of data. That is, OLS produces a function that minimizes SE. Because I previously show students
the relationship between SE and R2, the point can also be made here that OLS regression
produces the cost function that maximizes the calculated R2 value. Finally, the instructor can
make the point that SE is of principal interest because the information contained therein can be
used to construct confidence intervals around point estimates generated by the user’s regression
function.
Slide 11 deals with the importance of graphing the data (i.e., preparing a scatter graph or
scatter diagram) as a preliminary step to the application of regression analysis, while slide 12
presents a listing of four Excel-based methods that can be used to fit a linear regression model to
a set of data. Slides 13 and 14 complete the deck by referencing the three other files associated
with this component of the learning resource: an Excel file titled “Estimating Linear Cost
Functions Using Excel,” a Word file titled “Cost Estimation and Statistical Issues—Regression
Analysis,” and an Excel file4 titled “Change in SE as n increases.” Each of these three files is
explained more fully below.5
2.2. Excel file: Estimating linear cost functions using Excel
As indicated by the title, this file provides a comprehensive tutorial on the use of Excel
for estimating simple (i.e., one-variable) linear cost functions. A data set consisting of 14
observations (cost and associated cost-driver level) is used to illustrate the use of each of the
following five methods of fitting a linear regression model to the set of supplied data:





CHART Option
Built-in Regression routine
The LINEST built-in function
The SOLVER routine
SLOPE, INTERCEPT, and RSQ built-in functions
Throughout the referenced Excel file, citations to online video clips and other supplementary
documents regarding regression and regression-related topics are provided for the benefit of
students. This ensures that students have a rich source of information at their disposal as they go
through the cost-estimation process.
2.2.1. Using the CHART Option in Excel
This Excel file is embedded, as an object, in the aforementioned Word file titled “Cost Estimation and Statistical
Issues—Regression Analysis.” This embedded file can be accessed by double-clicking on the icon.
4
5
If the instructor chooses to use less than the full complement of files for this component of the resource, these last
two PowerPoint slides would have to be adjusted accordingly.
Page 3 of 11
The CHART option6 requires students to prepare and properly label (using whatever
enhancements they deem appropriate) a scatterplot of the data. The instructor at this point can
query students as to whether, based on the scatter plot of the data, it makes sense to fit a linear
equation to the data set. Following this, students use the “Add Trendline” option to fit a linear
equation to the dataset.7 Finally, students are instructed to use the option to “display the
estimated cost function” and its associated coefficient of determination (R2) on the chart itself.8
During class, the instructor can (after clicking anywhere within the constructed chart) go through
selected options under “Chart Tools” (under any of three categories: “Design,” “Layout,” and
“Format”).
2.2.2. Using the REGRESSION Routine in Excel
Students then proceed to use the REGRESSION routine in Excel, which is accessed by
going to “Data” and then “Data Analysis.”9 Summary output consists of the coefficient estimates
(for the variable cost rate, b, and the fixed-cost component, a), R2 for the estimated cost function,
and a complete ANOVA (Analysis of Variance) table.10 The Excel file includes a rich set of
supplementary notes, designed principally to “demystify” the discussion for students. That is, I
show in the Excel file precisely how the REGRESSION routine generated estimates of R2 and
SE.11 As such, the discussion here can be linked back to the initial set of PowerPoint slides
(discussed above). Students generally find this discussion illuminating. At the instructor’s
discretion, a discussion can then be made of what I tell students are “sampling-related” issues:
6
Tutorials for using the CHART option in Excel are available at http://office.microsoft.com/en-us/excel-help/howto-create-a-basic-chart-in-excel-2010-RZ102559017.aspx?CTT=1&client=1, www.ehow.com/video_12309398_
insert-chart-excel-2010.html, and at http://www.dummies.com/how-to/content/the-essentials-of-working-with-excel2010-charts.html.
7
Tutorials for adding a trend line to an Excel chart are available at
http://www.youtube.com/watch?v=ExfknNCvBYg, http://www.ehow.com/how_7224125_graph-trend-analysismicrosoft-excel.html, and at http://www.ehow.com/how_11403400_draw-trendline-excel.html .
8
See http://www.ehow.com/how_12115127_calculate-r-squared-measurements-excel.html. As noted above in
footnote #2, the coefficient of determination (R2) for a linear regression model can also be generated through
application of RSQ function built into Excel.
9
Clips presenting an introduction to ordinary least squares (OLS) regression analysis are available to students at:
http://www.youtube.com/watch?v=ZkjP5RJLQF4, http://www.youtube.com/watch?v=Qa2APhWjQPc and
http://www.youtube.com/watch?v=kHZBy1uVNnM. Tutorials for using the REGRESSION routine in Excel are
available at: http://www.youtube.com/watch?v=ExfknNCvBYg (this clip shows both the use of the CHART
function and the REGRESSION function in Excel), http://blog.yojimbocorp.com/2012/05/03/linear-regression-withexcel-2010/, http://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excel, and http://www.exceleasy.com/examples/regression.html .
10
Clips providing a discussion of regression-related output are available at
http://www.youtube.com/watch?v=8R6UcK91Cec, http://www.youtube.com/watch?v=c5blVUkkjTM, and
http://www.youtube.com/watch?v=aq8VU5KLmkY.
11
Clips providing a discussion of SE and R2, respectively, are available at
http://www.youtube.com/watch?v=dJR1WqeBgCg and http://www.youtube.com/watch?v=aq8VU5KLmkY.
Page 4 of 11
testing for the statistical significance of R2 (which, in a simple regression is equivalent to a test
on the slope coefficient, that is, the variable cost rate, b),12 the reliability of the estimated slope
coefficient (b), and the construction of confidence intervals around the coefficient estimates.
Obviously, the instructor is free to cover all or as little as desired of this material.
2.2.3. Using the LINEST function in Excel
A third option for fitting a linear model to the data set consists of using the LINEST
function in Excel.13 I sometimes cover this option in class principally because, to use this
function, students must enter a command in the form of an array. The advantage of exposing
students to the LINEST function in Excel is three-fold: one, they are exposed to the process of
entering a formula as an array—something the students could leverage and apply in other
contexts within Excel;14 two, use of the LINEST function in conjunction with either (or both) of
the methods discussed above makes the point to students that in Excel there are different
approaches or options to accomplishing designated tasks—a lesson that may be useful later in
their studies and/or in professional practice; and three, it allows students to see that the
regression results (estimated coefficients, standard errors of the coefficients, SE, R2, etc.) are
consistent across methods used.15 Arrows inserted into the file allow students to readily “see”
this consistency and therefore, it is hoped, demystify the process for them.
12
As shown in the Excel file, students can use the F.DIST.RT function in Excel to test for the statistical significance
of the calculated R-squared value. The null hypothesis is that the population R-squared value is zero. The question
we address is: given the sample, what is probability that such a value would occur if the population value for Rsquared is zero. Students see from the Regression output that this test is an F-test. To conduct this test, we need three
pieces of information: the F-statistic (from the Regression routine) and its associated degrees of freedom (both
numerator and denominator). The F-statistic is defined as the ration of the MSR (mean square regression) to MSE
(mean square error); the numerator degrees of freedom = k; the denominator degrees of freedom = n – k –1 (where k
= the number of independent variables in the regression model) (Anderson et al., 1987, pp. 479-480). These three
pieces of information are then entered into the following formula (which is pasted into an open cell):
=F.DIST.RT(F,dfn,dfd), where F = the calculated F-statistic from the Regression output, dfn = numerator degrees of
freedom, and dfd = denominator degrees of freedom. The resulting value represent the probability of observing an Fstatistic equal to F (or higher) if the population value for R-squared is zero. Basically, large values of the F-statistic
cast doubt on the null hypothesis that the population R-squared value is zero (or, equivalently in the case of a simple
regression model) that the population value of the slope coefficient in the cost equation is zero). An explanation of
the F.DIST.RT built-in function in Excel is available at: http://www.excelfunctions.net/Excel-F-Dist-Rt-Function.
html and http://msdn.microsoft.com/en-us/library/office/ff196140.aspx.
13
Useful supplementary sources for the LINEST function in Excel include the following:
http://www.ehow.com/how_8454834_use-linest-excel.html; http://www.youtube.com/watch?v=K1uelkQ6D-o;
http://www.youtube.com/watch?v=6wbcPbYbq6M; and, http://www.techonthenet.com/excel/formulas/linest.php.
14
The following clip provides background information on the use of array formulas and functions in Excel:
http://www.youtube.com/watch?v=F2iS8fiqLao. Additional tutorials on the use of array formulas are available at
http://office.microsoft.com/en-us/starter-help/create-or-delete-a-formulaHP010342373.aspx?CTT=1&client=1#_Toc251333379 and at http://www.dummies.com/how-to/content/how-tobuild-an-array-formula-in-excel-2010.html.
15
Note that while the LINEST function in Excel generates all of this output, the individual components of the output
are not labeled by Excel. For this reason, I have inserted into the Excel file “Estimating linear cost functions using
Excel” boxes containing pertinent labels.
Page 5 of 11
2.2.4. Using the SOLVER routine in Excel
Elsewhere, in both the undergraduate cost accounting course and in the MBA managerial
accounting course I teach, I expose students to the use of the SOLVER routine in Excel.16
Previously, students learn from the regression module that OLS “works” by choosing the cost
coefficients in a linear equation (i.e., a and b) such that the resulting equation minimizes the SE.
This point is driven home to students by having them structure and solve a “constrained
optimization” problem. Specifically, the SOLVER routine can be used to choose the two cost
coefficients such that the resulting “sum of squared error terms” (that is, the SE) is minimized.
Students “prove” this by using the SOLVER routine and to generate the coefficient estimates.
They readily see that the coefficient estimates they generate using the SOLVER routine are
precisely the same as those obtained using the alternative methods discussed above. As well,
students “see” that the resulting error sum of squares (ESS) produced by the SOLVER routine is
exactly the same as the ESS produced by the other methods. In this sense, then, students are
better able to understand the “background mechanics” of the OLS method. I generally conclude
this part of the in-class lecture by pointing out to students that the SOLVER routine will be used
later in the course (determining optimal short-term product/service mix, as noted above) and in
the operations management course they are taking (or will eventually take).
2.2.5. Using the SLOPE, INTERCEPT, and RSQ functions in Excel
Finally, I demonstrate to students that the built-in functions SLOPE, INTERCEPT, and
RSQ in Excel can be used to produce regression results equivalent to those generated by
application of the preceding methods.
2.3. Word file: Cost estimation and statistical issues—regression analysis
This file covers three topics: alternative cost-estimating procedures in Excel (i.e.,
alternative methods of estimating total cost, Y, given a value of the cost driver, X, after
estimating a linear cost function); an analysis of changes in the standard error of the regression
(SE) as the sample size, n, changes; and, building confidence intervals around point estimates for
Y given a value of X. Each of these topics can be covered independently of the others, or skipped
entirely if there are time constraints. As is the case with the aforementioned Excel file, the Word
document contains numerous references to online resources (video clips, documents, etc.) that
provide a rich source of information to students as they expand their knowledge of the above set
of three topics.
16
SOLVER is used in conjunction with the topic of choosing an optimum short-term product (or service) mix.
Specifically, SOLVER is used to structure and solve a “constrained optimization” problem. Background information
regarding the SOLVER routine in Excel is available at http://blogs.office.com/b/microsoftexcel/archive/2009/09/21/new-and-improved-solver.aspx and http://www.excel-easy.com/data-analysis/solver.html;
the following video clips also provide useful information: http://www.youtube.com/watch?v=eQoPjlnuZ6o,
http://www.youtube.com/watch?v=K4QkLA3sT1o, and http://www.youtube.com/watch?v=9G3MjOunLqQ.
Page 6 of 11
2.3.1. Generating estimated cost data
As indicated in the Word file, there are five cost-estimation options in Excel: Trend Line
Approach (after preparing a CHART from the user’s data set, the “Add Trend Line” option can
be used to forecast backwards or forwards—this approach simply extends the Trend Line that is
constructed using the CHART Option in Excel); Equation Approach (here there are two options:
students can enter into a cell an equation that includes the regression coefficients generated using
any of the four methods discussed earlier; alternatively, students can use the INDEX(LINEST…)
function in Excel to generate and place in a designated cell the slope coefficient (variable cost
rate, b) and a separate INDEX(LINEST…) function to generate and place in a designated cell the
fixed-cost component of the cost function, a;17 Trend Function (the values of the cost driver, X,
for which an estimated cost, Y, are to be generated, based on a linear regression analysis are
included as one of the arguments in the Trend Function, the formula for which is pasted into an
open cell);18 Formula Approach (the general formula, entered as an array, is: =SUM({b,a}*
{x,1}), where b = the slope coefficient [variable cost rate], a = the fixed-cost component in the
total cost function, and x = the cost driver value for which an estimate of total cost, Y, is needed;
to calculate the two coefficients the aforementioned LINEST function can be used, in which case
the above formula is rewritten as: =SUM({LINEST(B12:B19, A12: A19)} *{11,1}), where the
cells B12:B19 refer to the values of the dependent variable, Y, and A12:A19 contain the related
cost-driver amounts, X. The formula returns the estimated total cost, Y, for X =11, based on a
linear equation fit to the data set found in A12:B19); and, Using the Forecast Function19 (the
following formula can be used to forecast a value for Y (total cost) for a value of x (cost driver),
after implicitly fitting a linear function to a data set: =FORECAST(x, known_y’s, known_x’s);
note that the FORECAST function is similar to TREND, except that it is used to generate a
single estimate, rather than an array of estimates (which could also be a single point).
2.3.2. Analysis of changes in SE as n changes
Students may ask: “how can we, as cost analysts, decrease the standard error of the
regression, SE?” In general, this question seems to emanate from a desire to generate a more
17
Useful tutorials regarding the use of the INDEX function in Excel include:
http://www.techonthenet.com/excel/formulas/index_function.php, http://office.microsoft.com/en-us/excelhelp/video-index-function-VA102581137.aspx?CTT=1&client=1, or
http://spreadsheets.about.com/od/lookupfunction1/ss/2011-03-02-excel-2010-index-function.htm . A useful
explanatory source for using the LINEST function within INDEX is: http://www.mrexcel.com/forum/excelquestions/619168-having-trouble-understanding-linest-within-index.html.
18
Background material regarding the use of the TREND function in Excel is available at:
http://support.microsoft.com/kb/828801, http://www.excelfunctions.net/Excel-Trend-Function.html,
http://www.ehow.com/how_2105842_use-excels-trend-function.html, http://www.ehow.com/how_5844333_usetrend-excel.html.
19
A discussion of the use of the FORECAST function in Excel can be found at
http://www.techonthenet.com/excel/formulas/forecast.php and at http://support.microsoft.com/kb/828236.
Page 7 of 11
accurate cost function. Students might pose the question as: “If the line of best fit is determined
by minimizing the SE, what strategies are available for decreasing SE?” To address this issue, I
created (and embedded in the Word document as an object, as noted above) an Excel file titled
“Change in SE as n increases.”
The discussion here begins with a presentation of the formula for calculating SE (i.e., the
square root of the “mean squared error” (MSE), where MSE = the error sum of squares ÷ degrees
of freedom, n – k – 1, where k = number of independent variables and n = sample size). From
this, students can see that SE can be decreased either by decreasing the numerator in the MSE
calculation OR by increasing the denominator (or by doing both). To illustrate these two points, I
begin by calculating in the Excel file the SE associated with the following three observations
(X,Y): (50, $250), (100, $310), and (150, $325). The generated regression equation based on this
set of three data points is: Y = $220 + $0.75X.
The base-case situation in the Excel file “Change in SE as n changes” indicates that for
the preceding cost function, SE = 18.37 (i.e., 2√337.5/1). To show students how SE decreases as
the sample size, n, increases (ceteris paribus), I then replicate the data set in the Excel file so
there are six (rather than three) data points, which produces an SE of 12.99 (i.e., 2√675.0/4). I
tell students this is a “denominator effect.” Finally, to demonstrate the “numerator effect,” I hold
constant the fact that there are six (rather than the initial three) observations. However, this time
instead of replicating the initial data set, I generate in the Excel file three “abnormal”
observations. For this assumed data set, the SE increases to 24.94 (i.e., 2√2,487.5/4), in spite of
the fact that the sample size doubled. Thus, students should come to a realization that increasing
the sample size does not necessarily decrease the SE! Finally, data are provided in the Excel file
to show that whether and to what extent SE changes as n (the sample size) changes is a function
of the rate of change in the numerator of the SE calculation relative to the rate of change in the
denominator of the calculation. That is, the change is jointly a function of a “numerator effect”
and a “denominator effect.”
2.3.3. Constructing confidence intervals around point estimates of the dependent variable
The third and final topic addressed in the supplementary Word document deals with the
construction of confidence intervals around point estimates generated by the cost function that
students develop. The discussion is subdivided into a technical analysis and a useful
approximation (i.e., more practical) approach. In my opinion, the discussion of developing
confidence intervals around point estimates is particularly relevant for MBA students.
I generally begin the discussion by reminding students that the regression model, as good
as it might be, still produces estimates of costs. Such “future values” are subject to uncertainty.
One way to capture this uncertainty is through the use of confidence intervals. In fact, I generally
make the point to my graduate students that supplying point estimates is unwise, in large part
Page 8 of 11
because those estimates fail to capture the full information set from the data points used to
estimate the regression model.
The technical analysis of confidence-interval construction begins with a discussion of
variance; this time, however, I tell students that we need to estimate the variance associated with
the regression-predicted value of Y. In addition, in building our confidence interval we need to
specify a confidence level (e.g., 90% or 95%), which in turn is reflected in a t value (with n – 2
degrees of freedom). The Word document I created includes a table of t values for different
confidence levels and degrees of freedom (df).20 Discussion then turns to the approximation
approach, which uses SE as a substitute for the more precise and conceptually correct, but
difficult to estimate, variance of the estimate of an individual value of Y based on a given value
of X.
2.4. Excel file: Change in SE as n increases
As noted above, this file provides a simple example of how SE changes in response to
changes in the sample size of the data set, n, and in response to changes in the underlying linear
“fit” of the model. As also noted above, this file is embedded as an object in the Word document
titled “Cost Estimation and statistical issues—regression analysis.”
3. Part Two: Estimating Learning-Curve Functions (Incremental Unit-Time Model)
The second major component of the learning resource deals with the estimation of a
particular type of non-linear function: a learning curve. I focus specifically on fitting what is
known as the “incremental unit-time model” to a set of observations. As noted on page 2 of this
document, there are three related files: a set of PowerPoint slides (“Estimating Learning-Curve
Cost Functions”), a Word document (“Example—Estimating a Learning-Curve Function”), and
an Excel file (“Learning-Curve Analysis [“Incremental Unit-Time Model”]).
3.1. PowerPoint slides
The PowerPoint deck consists of six numbered slides. Slide one, as a prelude to
presenting the learning-curve model, is used as the basis for presenting to students a refresher on
logarithms. I find, from experience, that many (if not most) students have little-to-no recollection
of this topic. I tell students at this point that background information regarding logarithms is
fundamental to our ability to generate more sophisticated cost-prediction models, represented in
the present context as a learning-curve function. In slide #2 I introduce students to the following
learning-curve model: Y = aXb, where b = the learning curve index (i.e., Log(LCR)/Log(2)). I
20
Alternatively, the T.INV.2T built-in function in Excel can be used to generate the critical t value for constructing
a confidence interval. For example, the critical t value for a two-tailed t-distribution, for α = 0.05 (i.e., 95%
confidence interval) and df = 8 is 2.306004, the same value found in the chart referenced above. See:
http://www.excelfunctions.net/Excel-T-Inv-2t-Function.html or http://msdn.microsoft.com/enus/library/office/ff821541.aspx.
Page 9 of 11
indicate to students that there are two forms of this general model: the cumulative-average time
model and the incremental unit-time model. Slides 3 and 4 are designed to have students think
about the range of possible values for the exponent, b, in each of these two forms of the model;
in other words, I try to get them to think about what is necessary for the preceding equation to
capture learning effects (efficiency gains associated with experience in a process). The goal is to
have students gain a conceptual understanding about the nature of a particular type of non-linear
cost function, as well as a “feel” for each of two forms of the learning-curve model observed
most often in business. The deck of slides ends with a call to explore (via the prepared Word
document and the associated Excel file) the estimation and use of the “Incremental Unit-Time
Model.”
3.2. Word document: “Example—Estimating a learning-curve function”
This document begins with a set of 14 observations (values of X and Y) and a plot of this
set of data (created as an Excel CHART).21 This is followed by a two-page expanded discussion
of issues raised on the PowerPoint slides (viz., learning-curve theory, general functional form of
the learning-curve model used in business, and explanation of the “incremental unit-time
learning curve model”). Finally, two approaches to fitting a learning-curve model using Excel
are offered to students: a six-step process whereby an OLS regression is fit to log-transformed
data, and using the built-in Power function in Excel, which is accessed by first graphing the data
set, adding a Trendline, then choosing “Power” under “Trendline Options.” The former approach
is detailed in the companion Excel file titled “Learning Curve Analysis (Incremental Unit-Time
Model” while the latter approach is in the Word document itself. Students will see that these two
approaches are equivalent (i.e., they result in the same estimated learning-curve model for the
given data set). Both approaches are, in fact, rather straightforward extensions to the linear
modeling exercise completed as part one of the instructional resource.
3.3. Excel file: “Learning Curve Analysis (Incremental Unit-Time Model)”
As noted above, this file shows how to use the REGRESSION function in Excel to fit a
learning-curve model to log-transformed data (i.e., both X and Y values are log-transformed).
The only “trick” is converting the resulting regression-estimated coefficients back to “regular”
numbers. Under the assumption that log10 was used to transform the data before running the
REGRESSION function, the fixed cost component (a) is found by inserting the following
formula into an open cell: =POWER(10,Intercept),22 where “Intercept” = the estimated value of
21
Note that the data set used in part one of the tutorial (simple linear regression) differs from the data set used in
part two (learning-curve analysis). The data set used in part two is, in fact, from the case by Stout and Juras (2009),
which provides a much more detailed and comprehensive analysis than the introductory analysis contained in the
present instructional resource.
22
The POWER function in Excel is discussed in http://www.itechtalk.com/thread10266.html and in
http://www.techonthenet.com/excel/formulas/power.php.
Page 10 of 11
Y when X =1, as produced by the REGRESSION function, while the learning-curve index (b) is
given as the estimated coefficient associated with the log-transformed independent variable,
Log(X). Finally, the learning-curve rate (LCR) can be estimated by using the following function:
=POWER(10,(LOG(X)*LOG(2))), where LOG(X) is the regression-generated coefficient
associated with the log-transformed independent variable, Log(X).23 That is, LCR = 10 to the
POWER of “LOG(X)*LOG(2).” For example, if (b) = −0.430446, then LCR = 10(-0.430446*Log(2)) =
10(-0.430446*0.30103) = 10-0.12958 = 74.20%.
If the original data set were converted to natural logs (Ln), then the “a” term in the learning-curve model would be
estimated by raising e (the base of the natural logarithm) to the power of the estimated intercept term. Thus, if
natural logs were used to transform the data, “a” would be found by raising e to the power of the estimated intercept
term. In Excel, this is accomplished by inserting the following formula into an open cell: =EXP(Intercept), where
Intercept is the coefficient estimated by the regression of Ln(Y) on Ln(X). (The EXP function in Excel is discussed
in http://office.microsoft.com/en-us/excel-help/exp-function-HP010342500.aspx.) To estimate the learning-curve
rate (LCR) when natural logs (LN) were used to transform the original data set, the following formula should be
placed into an open cell: =EXP(b*LN(2)), where b = the coefficient estimate for LN(X). Results would be
equivalent to those obtained when the original data set was transformed using log base 10: a = 25.028, b = −0.43,
and LCR = 74.20%.
23
Page 11 of 11
Download