1 2 3 4 Even if we could observe all the members of a population, it is often costly to do so in terms of time, money, and/or computing power. The additional benefit gained is not typically worth the additional cost. The field of statistics is specifically designed to alleviate the need to observe an entire population. 5 Although it is possible to know parameters, it is most often the case that we don’t know them with certainty. The field of inferential statistics is devoted to inferring population parameters from sample characteristics. 6 In order to create valid statistical inferences, we must choose methods that are valid for the data we use. The underlying nature of the data affect the choice of statistical modeling techniques. Nominal Scales Nominal scales are purely categorical, and you cannot perform mathematical transformations on them nor make inferences based on their values without further statistical analysis. They are not rank‐ordered. This is the weakest form of measurement scale. An assigned nominal value of 1 is not necessarily “better” than an assigned nominal value of 2, nor is the value of 2 twice that of 1. Ordinal Scales Ordinal scales are ranking scales wherein we can determine which ranks are “better,” but not necessarily by how much. For example, with an ordinal scale, a 2 may be better than a 1, but we cannot necessarily say that 2 is twice as good as 1. We cannot perform mathematical transformation on data of this type either. Interval Scales Interval scales provide both ranking and equal differences between scale values; 7 hence, continuing our prior example, 2 is better than 1 and 2 is a single unit better than 1 but still not necessarily twice as good. With interval scales, the zero point of the scale does not indicate the absence of the attribute. An easy example of such a scale is temperature. Without a true zero point, we cannot form ratios. Ratio Scales Ratio scales are interval scales with a true zero point. We can perform a variety of mathematical and statistical operations on such scales. The “strength” of the scale is a function of our ability to analyze the data meaningfully using mathematics and statistics and it increases as we increase the “informativeness” of the data in a mathematical sense, meaning more applicable operations mean more “informativeness.” 7 We will use these returns for the next several slides. Equation (on slide): Pt = Price per share at the end of time period t Pt − 1 = Price per share at the end of me period t − 1, the me period immediately preceding time period t Dt = Cash distributions received during time period t Two important features for this specific return calculation: 1) an element of time is associated with it, and 2) rate of return has no unit associated with it (for example, currency). 8 9 10 Most statistical software and mathematical analysis packages, including Excel, will automatically form frequency distributions for the user. The user can typically specify the values in Steps 3 and 4 (functionally related to 3). Guidelines for setting k vary, but generally speaking, the balance is between summarizing the data enough to be useful, but also not losing relevant characteristics. 11 12 This example uses holding period returns to demonstrate forming a frequency table. For interval width, we show that [11.43 – (–4.57)]/4 is approximately 4. Ideally, we’d like for the intervals in our distribution to be easy to identify and understand. That likely means using integers as interval endpoints, ensuring that we have few empty intervals (preferably, none). If possible, it also means ensuring that our distribution “breaks” at natural points. A good example of this with returns is to ensure the distribution breaks at “zero” so we can classify returns as positive or negative. Ultimately, we want the frequency distribution to give us an idea of the centrality, shape, and dispersion of the data. We should have an idea of where most of the observations lie and the relative distribution of observations across the possible range of values. 13 The prior example has been continued for ease of exposition. Relative frequency is the absolute frequency value for a given interval divided by the total number of observations (in this case 12). Cumulative frequency is the sum of the relative frequency value for that cell and the prior cells in the relative frequency column; hence, 0.583 = (0.250 + 0.333), and so on. This is a good moment to analyze the table in the slide for the properties mentioned in the prior slide. The frequency distribution fails on several counts, most notably not having a “natural” break at zero or using interval cutoffs. Given the underlying data, this could easily have been accomplished without changing the table at all (there are no observations between ‐0.57 and 0). Cutoffs giving the same distribution are –5 < observation < 0 0 < observations < 4 4 < observation < 7 7 < observation < 12 14 A histogram presents the data from a frequency distribution as a graphical representation of ordered data and magnitude by interval classification. It is designed to provide a more visual and intuitive sense of the centrality and dispersion of the data. It should be noted that the user should ensure that the intervals are ordered in such a manner that the histogram preserves the ordering of the original data. Most mathematic and statistic packages, including Excel, will produce histograms. 15 We construct a frequency polygon by connecting the midpoints of the absolute frequencies in a straight line. In essence, we are just replacing the highest points of the bars with a straight line. This tool is quite useful when we have a large number of categories. Frequency polygons have a higher visual continuity than histograms. Steep slopes indicate higher rates of change as you move from one interval to the next, and shallow slopes indicate lower rates of change. In the cumulative absolute distribution frequency polygon, the slope is proportional to the number of observations in that interval. The user should note that a frequency polygon, like a histogram, needs ordered data along its x‐axis. 16 For Arithmetic mean — Please take note of the difference in notation: the Greek letter for population parameters and caps for the variable and count; X‐bar for the sample statistic and lower case for the variable and count. You may find the center of gravity analogy useful for providing the intuition behind the mean as a calculation. Arithmetic means are by far the most commonly used statistic in the investments arena and are generally viewed as a measure of the typical outcome. Population (sample) arithmetic means are means calculated for a population (sample). Note that we don’t normally expect the arithmetic mean to equal the value of any of the observations. Arithmetic means have a number of statistical properties, including sensitivity to outliers (weakness) and use of all available data about the magnitude of the observations (strength). 17 The MSCI EAFE Index is designed to represent the performance of large and mid‐ cap securities across 21 developed markets, including countries in Europe, Australasia and the Far East, excluding the U.S. and Canada. The Index is available for a number of regions, market segments/sizes and covers approximately 85% of the free float‐adjusted market capitalization in each of the 21 countries. 18 This is a visualization of the mean. The fulcrum in this diagram is placed at the mean as calculated on the prior slide. Note that this means it needs to be slightly to the left of the geometric center (or Norway’s vertical bar with median of – 29.72%) of the distribution because some of the countries on the left had large negative returns. 19 20 Weighted mean point With a market‐value strategy, or constant proportions strategy, you can measure mean first and then the weight or the weight first and then measure mean. Weighted averages occur throughout finance in all areas, including corporate finance (weighted average cost of capital) and investments (portfolio returns). A key feature of weighted averages is that the weights must sum to 1 (they can, however, be negative, depending on the application). A weighted mean in which the weights are probabilities is an expected value. Geometric mean point Geometric means are most commonly used with rates of return, rates of change over time, growth rates, and so on. You can substitute R with either of these rates. One problem we encounter with geometric returns (or one advantage of them over arithmetic) lies in how we handle negative returns. In using geometric mean returns, we normally add 1 to each return before taking the product and root and then subtract 1 from the resulting root. 21 Note that the weights must sum to 1. If we have only long positions, the weights must also all be positive, but with negative positions, the weights could also be negative. 22 23 Median The largest advantage associated with medians is a lack of sensitivity to extremely large values (outliers). If you suspect that the large values are a result of mismeasurement in the data or the inclusion of nonrepresentative units of analysis (sample contamination), then median is probably a more appropriate measure of centrality than mean. It will almost always be more appropriate when you have skewed data. 24 25 26 Population variance and sample variance are the two most widely used measures of dispersion; they have very nice mathematical and statistical properties, particularly in large samples. 27 28 These measures arise in large part because it is difficult to compare means and standard deviations across different samples or portfolios. They are both measures of relative dispersion. Each expresses the magnitude of dispersion with respect to a common point. In the case of the coefficient of variation, that point is the mean of the observations. In the case of the Sharpe Ratio, that point is the mean of the returns above a risk‐free return. BOTH ARE SCALE FREE, and thus provide ease of use in comparing dispersion among datasets with different distributions. The Sharpe Ratio plays a prominent role in much of investment analysis, including the optimization of risky asset allocation in modern portfolio theory (more in Chapter 11). It is named after William Sharpe, a Nobel prize–winning economist, and is often used as a portfolio performance measurement tool. Two cautions in using the Sharpe Ratio: Negative Sharpe Ratios have a counterintuitive interpretation (increasing risk increases the Sharpe Ratio), so comparisons of negative and positive Sharpe Ratios should be avoided. The Sharpe Ratio also focuses on only one measure of risk: standard deviation. It will work well for portfolios with roughly symmetrical returns, but not so well for portfolios without them, including those with embedded options. Users of the Sharpe Ratio should ensure that it is an appropriate tool to assess a specific strategy or manager. 29 30 31 Often known as “higher moments,” skewness (third moment) and kurtosis (fourth moment; see next slide) both appear in the finance literature, with skewness having a decidedly larger presence. Kurtosis, the degree of “peakedness,” has a much less prominent presence. Skewness captures the degree of symmetry of dispersion around the mean. If a distribution has significantly more values on one side or the other of the mean, it is said to be skewed. A distribution with a significantly greater proportion of values close to and below the mean with a few significant large values high above the mean is said to be positively skewed (skewed right). A distribution with a significantly greater proportion of values close to and above the mean with a few large values far below the mean is said to be negatively skewed (skewed left). The indication of skewness (left/right, negative/positive) always refers to the “long tail” of the distribution. 32 A leptokurtic distribution (positive excess kurtosis) is more peaked than the normal distribution. More observations closer to the mean and out in the tails. Often known as having “fat tails.” A mesokurtic distribution has peakedness equal to the normal distribution. A platykurtic distribution (negative excess kurtosis) is less peaked than the normal distribution. It is more evenly distributed across the range of possible values. 33 The underlying foundation of statistically based quantitative analysis lies with the concepts of a sample versus a population. • We use sample statistics to describe the sample and to infer information about its associated population. • Descriptive statistics for samples and populations include measures of centrality and dispersion, such as mean and variance, respectively. • We can combine traditional measures of return (such as mean) and risk (such as standard deviation) to measure the combined effects of risk and return using the Sharpe Ratio. The normal distribution is of central importance in investments, and as a result, we often compare statistical properties, such as skewness and kurtosis, with those of the normal distribution. 34 35 Two key assumptions underlie portfolio theory: •Investors want to maximize returns for a given level of risk. If an investor is given a choice of 2 assets with equal levels of risk, they will choose the asset with the higher expected rate of return. Risk here is defined as the uncertainty of future outcomes or the probability of an adverse outcome. •Investors are generally risk adverse. If an investor is given the choice of 2 assets with equal expected rates of return, then risk aversion results in the investor selecting the investment with the lower perceived level of risk. There are investors who might not be risk adverse, although usually some risk aversion is combined with risk preference. Historical evidence over the long run has shown that most investors are risk averse, which means there is a positive relationship between expected return and expected risk. 36 Mean–variance analysis is the fundamental implementation of modern portfolio theory •Describes the optimal allocation of assets between risky and risk‐free assets when the investor knows the expected return and standard deviation of those assets. 37 Over half a century has passed since Professor Harry Markowitz established the tenets of mean‐variance analysis, or capital market theory, the focal point of which is the efficient frontier. Several assumptions underlie mean‐variance analysis. The assumptions establish a uniformity of investors, which greatly simplifies the analysis. 38 Mean‐variance analysis is used to identify optimal or efficient portfolios. Before we can discuss the implications of efficient portfolios, however, we must first be able to understand and calculate portfolio expected returns and standard deviations. 39 40 Positively Correlated items tend to move in the same direction Negatively Correlated items tend to move in opposite directions Perfectly Positively Correlated describes two positively correlated series having a correlation coefficient of +1 Perfectly Negatively Correlated describes two negatively correlated series having a correlation coefficient of ‐1 Uncorrelated describes two series that lack any relationship and have a correlation coefficient of nearly zero Assets that are less than perfectly positively correlated tend to offset each others movements, thus reducing the overall risk in a portfolio The lower the correlation the more the overall risk in a portfolio is reduced • Assets with +1 correlation eliminate no risk • Assets with less than +1 correlation eliminate some risk • Assets with less than 0 correlation eliminate more risk 41 • Assets with ‐1 correlation eliminate all risk 41 Transcript Time: 00:00 Transcript: hey guys it's MJ the students actually 00:02 and in this video I want to very quickly 00:04 explain the mean variance portfolio 00:06 theory and I want to give a very quick 00:09 overview so we're not actually going to 00:11 get into the mathematics we're just 00:13 going to get a very high‐level 00:15 understanding now the nice thing about 00:17 finance is that the name kind of gives 00:19 42 it away means you can think of as return 00:22 and variance you can think of as risk 00:25 and portfolio theory think of two assets 00:28 that we want to have in our portfolio 00:31 these can be bonds which are assumed to 00:34 be low risk low return and equity which 00:36 is assumed to be high‐return high risk 00:39 and the general idea was that you could 00:43 have a combination of both of these 00:45 assets so this would be 100% equity this 00:48 would be a hundred percent bonds and 00:50 this yellow line indicates something in 00:51 between and that the risk and return 00:53 relationship would follow this yellow 00:56 line however where the big breakthrough 00:59 was was that this is actually not the 01:02 case the mean variance portfolio theory 01:05 says that the curve does sound actually 01:08 like that it bends the fence and this is 01:11 known as the efficient frontier 01:13 and why are we getting this bencher that 01:17 was the whole thing why they won the 01:19 Nobel Prize and why everyone's like well 01:21 this is such a cool theory because what 42 01:23 they showed was that the variance of the 01:27 portfolio care to the variance of the 01:30 portfolio decreases decreases with a 01:36 thing known as diversification so by 01:41 introducing two different assets and 01:43 important thing here is that they must 01:45 be uncorrelated that the less correlated 01:49 they are the more you're going to get 01:51 this bend so if the answers are very 01:53 correlated you might just get a slight 01:56 Bend and this was the whole idea was 01:58 that by adding in assets that were 02:01 uncorrelated you could reduce risk and 02:05 increase return and this went against 02:08 the whole general understanding of in 02:10 order to take get more return you need 02:12 take on more risk and that's basically 02:15 it so that is it very simply explained 02:17 go check out a whole bunch of other 02:18 videos on YouTube 02:19 to get a more in‐depth understanding of 02:21 the maths behind it but I hope this 02:23 clears it up very quickly for you guys 02:25 42 thanks so much for watching 02:26 Jess 42 43 44 45 46 In this AIA and Prudential example, we calculated the expected return and standard deviation of one possible combination: 40% in AIA and 60% in Prudential. An infinite number of combinations of the two stocks are possible, however. We can plot these combinations on a graph with expected return on the y‐axis and standard deviation on the x‐axis, commonly referred to as plotting in risk/return “space” 47 The plot represents all possible expected return and standard deviation combinations attainable by investing in varying amounts of AIA and PRU. 48 There are several things to notice about this plot. 1) If 100% of the portfolio is allocated to Prudential, the portfolio will have the expected return and standard deviation of Prudential (ie. Prudential is the portfolio), and the investment return and risk combination is at the uppermost end of the curve. 49 2) As the investment in Prudential is decreased and the investment in AIA is increased, the investment moves down the curve to the point where the portfolio’s expected return is 15.3% and its standard deviation is 13.6%. (Labelled as Point C.) 50 Minimum‐variance portfolio With multiple assets, there is an infinite number of possible weights that can be used to achieve any specific level of return, but only one set of weights that gives the smallest possible level of variance for that level of return. This set of weights that forms the smallest variance among all portfolios is the minimum‐variance portfolio. Minimum‐variance frontier Because our investors are risk averse at any fixed level of return, they will prefer to hold the asset combination at that level of weights. When we determine the set of minimum‐ variance weights for all possible levels of return, we have determined the weights for all the portfolios on the minimum‐variance frontier. 51 Portfolios on the efficient frontier provide the highest possible level of return for a given level of risk. The efficient frontier is an extremely useful portfolio management tool. Once the investor’s risk tolerance is determined and quantified in terms of variance or standard deviation, the optimal portfolio for the investor can be easily identified. 52 In mean‐variance analysis, we use the expected returns, variances, and covariances of individual investment returns to analyse the risk‐return tradeoff of combinations (portfolios) of individual investments. The minimum variance frontier is a graph drawn in risk‐return space of the set of portfolios that have the lowest variance at each level of expected return. The efficient frontier is the positively sloped portion of the minimum‐variance frontier. Portfolios on the efficient frontier have the highest expected return at each given level of risk. 53