Morningstar's risk-adjusted ratings

Morningstar's risk-adjusted ratings
William F Sharpe. Financial Analysts Journal. Charlottesville: Jul/Aug 1998.Vol.54,
Iss. 4; pg. 21, 13 pgs
Abstract (Document Summary)
The characteristics of the risk-adjusted rating (RAR) on which Morningstar bases its star
ratings and category ratings are analyzed, and the RAR is compared with more
traditional mean-variance measures. The RAR measure has characteristics similar to
those of an expected utility function based on an underlying bilinear utility function. Strict
adherence to maximizing expected utility with such a function could lead to extreme
investment strategies. A study finds that Morningstar varies one of the parameters of this
function in a manner that frequently produces results similar to the results of using the
excess-return Sharpe ratio. The argument is presented that neither Morningstar's
measure nor the excess-return Sharpe ratio is an efficient tool for choosing mutual funds
within peer groups for a multifund portfolio.
Full Text (3417 words)
Copyright Association for Investment Management and Research Jul/Aug 1998
The characteristics of the "risk-adjusted rating" (RAR) on which Morningstar bases its
"star ratings" and "category ratings" are analyzed, and the RAR is compared with more
traditional mean-variance measures. The RAR measure has characteristics similar to
those of an expected utility function based on an underlying bilinear utility function.
These characteristics are of some concern because strict adherence to maximizing
expected utility with such a function could lead to extreme investment strategies. This
study finds that Morningstar varies one of the parameters of this function in a manner
that frequently produces results similar to the results of using the excess-return Sharpe
ratio. Finally, the argument is presented that neither Morningstar's measure nor the
excess-return Sharpe ratio is an efficient tool for choosing mutual funds within peer
groups for a multifund portfolio.
This decade's rapid, global growth of investment via mutual funds has led to a demand
for simple measures of the performance of such funds. In the United States, the most
popular such measure is the "riskadjusted rating" (RAR) produced by Morningstar, Inc.
Mutual fund families are proud to advertise that one or several of their funds have
"received five stars from Morningstar." One study found that as much as 90 percent of
new money invested in stock funds in 1995 went to funds with four-star or five-star
ratings.l Although this percentage may or may not be correct for mid-1998, certainly few
advertisements announce that a fund has received one star.
For better or worse, therefore, Morningstar's risk-adjusted measures greatly influence
U.S. investor behavior. Because they differ significantly from traditional risk-adjusted
performance measures, such as various forms of the Sharpe ratio, understanding the
strengths and limitations of Morningstar's measures is important.
Ex Ante and Ex Post Measurement
Mutual fund performance measures are typically based on one or more summary
statistics of past performance. Measures that attempt to take risk into account
incorporate a measure of historical return and a measure of historical variability or loss.
Because investment decisions affect only the future, the use of historical results involves
an implicit assumption that the statistics derived from past performance have at least
some predictive content for future performance. The evidence is ample that, although
measures of historical variability can be useful for predicting future levels of risk,
measures of average or cumulative return are, at best, highly imperfect predictors of
expected future return. Nevertheless, leaving questions of predictability for other studies,
the goal of this article is to examine the properties of Morningstar's and other measures
under the assumption that statistics from historical frequency distributions are reliable
predictors of corresponding statistics from a probability distribution of future returns.2 In
particular, the aim is to relate alternative performance measures to likely investment
decisions; the rationale is that, even if the relationship between the past and the future is
subject to a great deal of noise, one should attempt to select a performance measure
that aligns well with the decision to be undertaken. Ultimately, of course, the intent is,
first, to use all relevant information to make unbiased forecasts of expected returns,
risks, and any other relevant characteristics of future fund performance and, then, to use
such estimates to determine an optimal combination of investments in appropriate funds.
This analysis of the Morningstar measures focuses on their key properties. The reader
interested in empirical analyses of the Morningstar and more traditional measures, and
analyses of the similarities and differences among them in practice, will find a relatively
extensive treatment in Sharpe (1997).
Morningstar's Ratings : kunjungi web ini supaya anda dapat mempelajari gambar,
data dan rumus-rumus lain lebih baik
The analysis begins with a description of the computations used by Morningstar. The
risk-adjusted rating (RAR) for a fund is calculated by subtracting a measure of the fund's
relative risk (RRisk) from a measure of its relative return (RRet): So, if both Riski and
Reti are well approximated as functions of Mi and Si, then RARi will be also. Figure 4
shows the relationship between RAR and various combinations of expected annual
excess return, e, and standard deviation of annual excess return, sd, using the
approximations given for the case in which the riskless rate of interest is 5 percent a
year, the holding period is three years, and the peer group has an average excess return
of 5 percent and a standard deviation of 15 percent. As can be seen, the relationship is
monotonic and close to linear in the region shown, which includes likely combinations for
popular investment strategies. The high degree of linearity of the relationship in Figure 4
can be seen more clearly in Figure 5, which shows a few of the associated iso-RAR
curves. Clearly, an investor who wishes to maximize RAR is likely to select an extreme
solution unless the opportunity set is highly nonlinear. Recall that a portfolio is said to be
meanvariance efficient if it provides the maximum possible mean for a given level of
variance and the minimum possible variance for a given level of mean. Equivalently,
Fund A is said to be inefficient if a Fund B exists with (1) the same expected return but
less risk, (2) the same risk but more expected return, or (3) less risk and more expected
return. With functions such as those shown in Figures 4 and 5, in each such case, if the
approximations hold, Fund B will also have a higher RAR value than Fund A. Thus,
excluding from consideration portfolios that are inefficient, using the mean-variance
criterion is appropriate even if the ultimate goal is to select a portfolio with the largest
possible RAR value.
These relationships imply that the key differences between Morningstar's measures and
those used in more-traditional mean-variance analyses concern (1) the use of a linear
combination of a return measure and a risk measure, rather than a ratio of the two,
and/or (2) the use of risk per se rather than risk squared in the linear measure. The use
of a multiperiod value relative and a measure of average loss is thus of secondary
importance in terms of implications for fund selection.
These results illustrate the earlier assertion that Morningstar's actual RAR calculations
give implications for investment choice very similar to those given by the simpler
modified (MRAR) measure. Moreover, they suggest that if monthly returns are close to
normally distributed, a choice based on RAR measures will differ from one based on the
use of a traditional mean-variance approach only in the selection of an extreme point
rather than an interior point on the mean-variance efficient frontier. In effect, the RAR
measure assumes that an investor's marginal rate of substitution of expected return for
risk is the same no matter the level of his or her portfolio's return or risk. This assumption
is unfortunate because it implies a preference for extreme risk-return combinations,
which is inconsistent with investor behavior in this context and in moregeneral cases
involving choices among competing alternatives.
RARs and Excess-Return Sharpe Ratios
Conceptually, fund rankings based on RAR values are different from rankings based on
excess-return Sharpe ratios. This difference can be seen in Figure 6, which shows
selected iso-excess-return Sharpe ratio (iso-SR for short) lines and selected
meanvariance approximations of iso-RAR curves.
To assess the likely magnitudes of such differences, consider a selected mutual fund,
Fund X, and the iso-RAR and iso-SR lines on which it lies. Figure 7 shows a case in
which Fund X has an expected return of 10 percent and a standard deviation of 15
percent. The set of all funds that are better than Fund X based on the RAR criterion will
lie above the dotted line in Figure 7, and the set of all funds that are worse than Fund X
based on the RAR criterion will lie below the dotted line. The funds that are better than
Fund X based on the excessreturn Sharpe ratio will lie above the solid line, and those
that are worse will lie below the solid line.
Obviously, the sets of funds rated better or worse than Fund X will differ depending on
the criterion used, but the differences may be few. Figure 8 shows the regions in which
the criteria give different results. Any fund plotting in the gray shaded area will have a
higher RAR than Fund X but a lower excess-return Sharpe ratio. Any fund plotting in the
black shaded area will have a lower RAR than Fund X but a higher excess-return
Sharpe ratio. For all funds that plot above both lines or below both lines, however, the
criteria will lead to the same conclusion. In general, the closer the slopes of the two
lines, the fewer the disparities in rankings between the two criteria.
Now, recall the procedures used to compute Morningstar's RAR measures. The slope of
the isoRAR curves is given by the ratio of the return base to the risk base. If the period
i,sed for the computation is one in which the average return for the funds in the relevant
peer group was sufficiently high (greater than two times the return on T-bills), the return
base will equal the mean excess return for the funds in the peer group. In every case,
the risk base is the mean risk for the funds in the peer group: Let Fund A have mean
excess return and standard deviation of return equal, respectively, to the corresponding
average values fcr all the funds in its peer group. By construction, then, the
meanvariance approximation to the iso-RAR line for Fund A will be coincident with the
iso-SR line for Fund A. In such circumstances, the sets of funds that are better and
worse than Fund A will be the same no matter which criterion is used.
The same can be said about any fund that plots on Fund A's iso-SR (and iso-RAR) linethat is, any fund with the same excess-return Sharpe ratio as a fund with the average
risk and return for the peer group. In practice, funds are likely to cluster reasonably
closely around this line. Hence, peer groups with good average historical performance
might be expected to have rankings based on Morningstar's RAR measure relatively
similar to their rankings based on the excess-return Sharpe ratio. And Figure 9, taken
from Sharpe (1997), shows that this expectation is supported.
Each point in Figure 9 represents the percentile rankings (by Morningstar on the x-axis
and by the Sharpe ratio on the y-axis) of one of 1,286 diversified equity funds within its
category peer group based on its performance from 1994 through 1996. The correlation
coefficient was 0.986, showing that, despite substantial differences in computational
procedures, Morningstar's approach and the simpler excess-return Sharpe ratio do give
similar results in times of relatively high returns for U.S. equity funds.
These results are striking, but keep in mind that they were found during a high-return
period and, therefore, Morningstar's procedure used the mean returns of the peer groups
for the return bases in the calculations. Because ex post returns are used for the
performance measures, there can be situations in which the average return for a peer
group is small or even negative. In such cases, Morningstar sets the return base at the
level obtained by Tbills, which could lead to greater disparity between rankings based on
the Morningstar and the Sharpe ratio measures. Figure 10 shows an extreme version of
such a situation.
Fiqure 9.
In Figure 10, both Funds X and Y are shown to have performed poorly. But Fund Y had
a better (algebraically greater, or less negative) excessreturn Sharpe ratio than Fund )X,
as shown by its placement on a higher iso-SR (solid) line. Morningstar's RAR measure,
however, assigns a better rating to Fund X than to Fund Y because Fund X provided a
better average return and a lower risk, which resulted in Fund X plotting on a higher
isoRAR (dotted) line.
This example highlights the differences in the questions the two measures attempt to
answer. The argument here is that the RAIL measure is best viewed as an attempt to
determine a best single fund-on the assumption that only one fund is to be held in the
investor's portfolio. In this context, Fund X was certainly better (in this example, less
bad) than Fund Y. Moreover, Fund X would be superior for any (positive) degree of
investor risk aversion (any slope of the iso-RAR lines). An investor limited to one fund is
not the setting, however, for which the excess-return Sharpe ratio was developed. It is
intended for situations in which an investor can use borrowing or lending to achieve his
or her desired level of risk. In this latter context, the excess-return Sharpe ratio gives the
more appropriate answer: An investor who desired a level of risk of, say, 1() percent
would have either held Fund X or followed a strategy of putting two-thirds of his or her
wealth in Fund Y and lending the rest at the riskless rate (here, 5 percent). The latter
strategy, shown by point Y' in Figure 10, had a greater excess-return Sharpe ratio and
was clearly better than investment solely in Fund X.
Multifund Portfolios
Morningstar's measure is best suited to answer questions posed by an investor who
places all of his or her money in one fund. The excess-return Sharpe ratio is best suited
to answer questions posed by an investor who allocates money to one fund and also to
borrowing or lending. Neither type of investor should be interested in ranking funds
within peer groups; indeed, such rankings conceal information about the relative
magnitudes of the underlying variables-information that is crucial for such investors.
Why, then, does Morningstar present its risk-adjusted ratings in terms of rankings of
funds within peer groups?
The most plausible answer is that Morningstar assumes that investors have some other
basis for allocating funds and plan to use Morningstar's rankings as at least an important
input when deciding which fund or funds to choose from each peer group. In such a
situation, however, neither Morningstar's measure nor the excess-return Sharpe ratio is
an appropriate performance measure. The reason is simple: When evaluating the
desirability of a fund in a multifund portfolio, the relevant measure of risk is the fund's
contribution to the total risk of the portfolio. This contribution will depend on the fund's
total risk and, more importantly in most cases, on its correlations with the other funds in
the portfolio. Neither the Morningstar RAR measure nor the excess-return Sharpe ratio
incorporates any information about correlations. Therefore, excessive reliance on either
measure for selecting funds could seriously diminish the effectiveness of the resulting
multifund portfolio.
Figure 10.
In some very special cases, however, a different single measure of fund performance
may be useful for constructing an optimal multifund portfolio. Sharpe (1994) showed that
the "selection Sharpe ratio," based on the mean and standard deviation of the difference
between a fund's return and that of an appropriate asset-class benchmark, may be used
if the investor may take long and short positions in asset classes as needed. The
preconditions for this special case are not met in many cases, however, and even if they
are, significant differences can exist between rankings based on excess-return Sharpe
ratios and rankings based on selection Sharpe ratios. Given the relationships between
RARs and excessreturn Sharpe ratios, rankings based on selection Sharpe ratios will
also differ considerably from those based on RARs.
In short, in many cases, using any procedure to rank funds within peer groups and then
using the rankings to select one or more funds from each of several peer groups is likely
to be suboptimal. In some cases, the process will be highly suboptimal.
Morningstar's RAR measure has a number of drawbacks. It is complex, and it has poor
statistical qualities. More importantly, it fails to capture an important aspect of investor
preferences-the desire for portfolios that are neither the least nor most risky available.
Fortunately, the inherent disadvantages are considerably mitigated by Morningstar's
practice of adjusting the risk aversion implicit in the measure to equal the ratio of return
to risk for each peer group over the specific period covered, although this adjustment is
made only if the peer group performance has been modest or poor. This adjustment
increases the time and sample dependency of the measure, but it has the advantage of
aligning the Morningstar rankings well with rankings that would be obtained by using the
more familiar, less complex, and statistically more straightforward excess-return Sharpe
If the only choice for a measure by which to select funds is between Morningstar's RAR
measure and the excess-return Sharpe ratio, the evidence favors selecting the Sharpe
ratio. A more appropriate choice, however, is to use either a different performance
measure or none at all. If the investor can separate fund selection from asset allocation
without cost by taking long and short positions as needed in index funds representing
"pure asset plays," the investor can usefully evaluate funds on the basis of their
projected selection Sharpe ratios. Such measures take into account only a fund's nonasset-related expected return and risk. Typically, rankings based on selection Sharpe
ratios will differ considerably from rankings based on Morningstar's measures or
excessreturn Sharpe ratios, and of course, the preferred portfolios that result will also
Although it is tempting to conclude that investors constructing multifund portfolios should
shift their focus from performance measures based on total or excess return to
measures such as the selection Sharpe ratio that are based on differential or relative-tobenchmark return, the conclusions of this study do not lead to such counsel. The
conditions under which the selection Sharpe ratio is appropriate are stringent and
unlikely to hold for many investors. Rather than continue the search for the ideal
universal performance measure, the preferable approach is to return to basics.
Markowitz taught that portfolio construction should take into account the best possible
estimates of all relevant future risks and returns. This principle is as true for portfolios of
mutual funds as it is for portfolios of individual securities. Asset allocation exercises,
followed by selection of funds within peer groups based on simple rankings, are easy,
but they may lead to inefficient overall portfolios. A better approach-one based on first
principlestakes into account the complexity involved in portfolio decisions. The key
information an investor needs to evaluate a mutual fund is (1) the fund's likely future
exposures to movements in major asset classes, (2) the likely added (or subtracted)
return over and above a benchmark with similar exposures, and (3) the likely risk vis-avis the benchmark. Investors should devote their efforts to obtaining the best possible
estimates for future values of these key ingredients and to using the estimates optimally
to determine efficient portfolios.
Described in Damato (1996).
2. Paul R. Pudaite, Morningstar's director of Quantitative Research, has pointed out that
one does not need to make the assumption that future performance is related to past
performance to justify the measurement of the latter: "I believe there's a place for
achievement tests as well as aptitude tests of investment management. Independent of
predictive ability, it's important to develop a grading system of past performance that
provides today's investment managers with a clear and worthy goal for their future
efforts." (Private correspondence, February 1998). For the calculations used by
Morningstar, whether the sign is reversed makes no difference because of the
subsequent division by the risk base, which is ar average of all the risk numbers. The
sign is reversed here for ease of interpretation so that, as with standard deviation, a
smaller absolute value of risk will be considered more desirable than a larger absolute
This function was obtained, while taking into account the relationship shown in Equation
Al in Triantis and Hodder, by integrating over negative values of the excess return.
Damato, Karen. 1996. "Morningstar Edges Toward One-Year Ratings." Wall Street
Journal (April 5):C1. Kahneman, Daniel, and Amos Tversky.1979. "Prospect Theory: An
Analysis of Decision under Risk." Econometrica, vol. 74, no. 2 (March):263-91.
Markowitz, Harry. 1952. "Portfolio Selection." Journal of Finan<e, vol. 7, no. t
(March):77-91. Sharpe, William F. 1966. "Mutual Fund Performance." Journal of
Business, vol. 39, no. 1 (January):119-38. 1994. "The Sharpe Ratio." Journal of Portfolio
Management, vol. 21, no. 1 (Fall):49-59. . 1997. "Morningstar Performance Measures."
Tobin, James. 1958. "Liquidity Preference as Behavior towards Risk." Review of
Economic Studies, vol. 26, no.1 (February):65-86. Triantis, Alexander J., and James E.
Hodder. 1990. "Valuing Flexibility as a Complex Option." Journal of Finance, vol. 45, no.
2 (June):549-64.