QMETH Note 1: Testing for Normality As Sharon applies a normal distribution for forecasting the rate of return of a stock that has attracted her attention for investment, a natural question has occurred to her: how can she find out whether the rate of return of a stock actually follows a normal distribution? Although normality of a stock’s rate of return is accepted within the investment profession, Sharon wonders if she can confirm the validity of the practice. In this section, we learn tools to check if the normal distribution is a reasonable model for a variable. Histogram A simple method is to collect data on the rate of return of a stock for a reasonable number of periods, construct a histogram, and compare the shape with the normal distribution that has the same mean and the standard deviation as the sample mean and the sample standard deviation of the data, respectively. Fortunately, several convenient statistical packages are available for drawing both the histogram with the normal curve superimposed. As an example, the figure below shows a histogram with a normal curve for recent 61 observations of the monthly rate of return of Exxon. It is produced by SPSS. 20 10 Std. Dev = .05 Mean = .014 0 N = 61.00 -.125 -.075 -.100 -.025 -.050 .025 .000 .075 .050 .125 .100 .175 .150 The histogram uses only 61 observations, whereas the normal curve superimposed depicts the histogram using infinitely many observations. Therefore, sampling errors should show up as gaps between the two curves. The graph shows that the distribution of 1 the rate of return of Exxon appears not markedly different from a normal distribution, at least in the middle part. Normal probability plot The procedure described above is easy to understand, but is not effective for revealing a subtle but systematic departure of the histogram from normality. A better graphical check of normality is a normal probability plot. The plot can be easily developed using Excel and we describe the process below. The first step is to sort the data from the lowest to the highest. Let n be the number of observations. Then, the lowest observation, denoted as x(1) is the (1/n) th quantile of the data. A quantile times 100 is the percentile, so x(1) is also the (1/n) x 100 th percentile of the data. With this convention, however, the largest observation becomes the 100 percentile of the data, which presents a problem, as the 100th percentile of a normal distribution is infinity, the value that can never be assumed in observation. A suggested choice is to define the i-th largest observation, x(i) as the (i/(n+1))th quantile, or the (i/(n+1)) x 100 percentile of the data. In the Excel worksheet on the next page both choices are computed for comparison. The next step is to determine for each observation the corresponding quantile of the normal distribution that has the same mean and the standard deviation as the data. The following Excel function is a convenient way to determine the normal (i/(n+1) th quantile, denoted as x’(i). x’(i) = NORMINV(i/(n+1), sample mean, sample standard deviation). i/(n+1) x’(i) This value is the expected quantile if the data come from a normal distribution. x(i) 2 should be close to x’(i) if the normality of the distribution is true. The quantile of the normal distribution -- with the mean and the standard deviation equaling the sample mean and the sample standard deviation, respectively -- are computed in the column with the heading “Expected.” A normal probability plot is a scatterplot of the data vs. the expected quantiles. The plot is shown below. If the data indeed come from a normal distribution, then the scatterplot should deviate in a random fashion from the reference line. Note that the 45 degree line serves as a convenient reference line for detecting a systematic departure from normality. 0.2000 0.1500 0.1000 0.0500 0.0000 -0.1 -0.0 0.00 0.05 000 -0.0500 500 00 00 0.10 00 0.15 00 0.20 00 -0.1000 -0.1500 We show below two normal probability plots, one for a sample of 60 independent observations from a uniform distribution over the interval (0, 1) and the other for a sample of 60 independent observations from an exponential distribution with mean 0.5. Probabilities at both tails of a uniform distribution are significantly larger than a normal distribution. On the other hand, an exponential distribution is skewed. These departures from the normality are quite common, and the two plots show how each departure can be detected by using the normal probability plots. 3 Normal Probability Plot of Data From a Uniform Population Distribution The plot on the right is a normal probability plot of observations from a uniform distribution. The plot has an elongated S shape. 1.0 0.9 0.8 U n i f o r m 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.5 1.0 expunif Normal Probability Plot of Data From an Exponential Population Distribution The plot on the right is a normal probability plot of observations from an exponential distribution. The plot is convex. 2 E x p o1 n 0 0 1 expexpo 4 2 Jarque-Bera Statistic A normal probability plot test can be inconclusive when the plot pattern is not clear. In such case it is useful to compute a few numbers that measure non-normality. The asymmetry of the distribution is measured by the skewness which is the third central moment of the distribution: x 3 E 3 The sample skewness is evaluated as follows: ˆ3 3 1 n xi x n i 1 ˆ where: n 1 2 ˆ xi x n i 1 The skewness 3 is 0 for a symmetric population, as can been seen from the formula. ˆ 3 is significantly different from 0, then one can infer Therefore, the sample skewness that the population distribution is unlikely to be symmetric and hence not normal. Another number that can be used to check the normality of the distribution is the fourth central moment of the distribution, called the kurtosis 4 : x 4 E 4 The sample kurtosis is computed as follows: 4 x x ˆ 4 1 i ˆ n i 1 n The kurtosis measures the amount of the tail probabilities of the distribution and equals 3 ˆ 4 is significantly for a normal population distribution. Therefore, if the sample kurtosis 5 different from 3, then one can infer that the population distribution is unlikelty to be a normal distribution. ˆ 4 as follows: ˆ 3 and The Jarque-Bera Statistic combines the two measures n 2 1 2 ˆ 3 ˆ 4 3 JB 6 4 For a large number of observations, JB higher than 6 suggests that the population distribution is unlikely to be a normal distribution. The Excel worksheet below illustrates the computation of these statistics for the rate of return of the Weyerhaeuser stock for recent 60 months. Month Weyerhaeus er 0.27020 0.09449 0.08873 -0.02731 -0.10706 0.02551 0.02139 0.08088 -0.05442 -0.27098 -0.06645 0.10320 -0.00968 0.13487 -0.10145 -0.02065 -0.02000 0.11735 -0.08037 -0.02010 -0.00513 0.01753 0.02051 0.01005 0.08657 -0.04167 0.02899 0.10986 0.01709 -0.07143 0.16018 -0.01181 -0.05976 -0.06610 0.00459 0.00913 -0.10679 0.00000 0.06154 -0.04155 0.11735 -0.06849 -0.04216 -0.15026 -0.02439 -0.07875 0.07586 0.12179 0.06514 0.00543 0.04324 0.11606 0.14085 -0.11934 0.05794 -0.01339 57 0.00452 58 0.00631 59 -0.14932 60 0.17021 AVERAGE 0.00931 STDEVP 0.09164 z-sc ore 2.8469 0.9295 0.8667 -0.3996 -1.2699 0.1768 0.1318 0.7810 -0.6955 -3.0586 -0.8267 1.0246 -0.2073 1.3701 -1.2086 -0.3269 -0.3198 1.1789 -0.9787 -0.3209 -0.1575 0.0898 0.1223 0.0081 0.8431 -0.5563 0.2147 1.0973 0.0850 -0.8810 1.6464 -0.2305 -0.7537 -0.8229 -0.0515 -0.0019 -1.2669 -0.1016 0.5699 -0.5550 1.1789 -0.8490 -0.5617 -1.7413 -0.3677 -0.9609 0.7262 1.2275 0.6092 -0.0423 0.3703 1.1649 1.4354 -1.4039 0.5307 -0.2477 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 -0.0522 -0.0328 -1.7310 1.7558 0.00000 JB= (z-sc ore) 23.0741 0.8031 0.6509 -0.0638 -2.0478 0.0055 0.0023 0.4764 -0.3364 -28.6127 -0.5649 1.0756 -0.0089 2.5722 -1.7656 -0.0349 -0.0327 1.6386 -0.9373 -0.0331 -0.0039 0.0007 0.0018 0.0000 0.5992 -0.1721 0.0099 1.3211 0.0006 -0.6839 4.4625 -0.0122 -0.4282 -0.5572 -0.0001 0.0000 -2.0333 -0.0010 0.1851 -0.1709 1.6386 -0.6120 -0.1772 -5.2795 -0.0497 -0.8873 0.3830 1.8495 0.2261 -0.0001 0.0508 1.5806 2.9572 -2.7669 0.1494 -0.0152 3 (z-sc ore) 65.6902 0.7464 0.5641 0.0255 2.6004 0.0010 0.0003 0.3721 0.2339 87.5141 0.4670 1.1021 0.0018 3.5242 2.1339 0.0114 0.0105 1.9318 0.9173 0.0106 0.0006 0.0001 0.0002 0.0000 0.5052 0.0957 0.0021 1.4496 0.0001 0.6025 7.3469 0.0028 0.3227 0.4586 0.0000 0.0000 2.5760 0.0001 0.1055 0.0949 1.9318 0.5196 0.0995 9.1929 0.0183 0.8526 0.2782 2.2702 0.1378 0.0000 0.0188 1.8412 4.2447 3.8843 0.0793 0.0038 4 -0.0001 0.0000 -5.1869 5.4131 -0.03913 1.43903 0.0000 0.0000 8.9787 9.5046 3.75464 0.3 0000 0.2 0000 0.1 0000 0.0 0000 -0.2 000 -0.1 000 0.0 000 -0.1 0000 -0.2 0000 -0.3 0000 6 0.1 000 0.2 000 0.3 000