Directional (Circular) Statistics Directional or circular distributions are those that have no true zero and any designation of high or low values is arbitrary: • Compass direction • Hours of the day • Months of the year Time can be converted to an angular measurement using the equation: (360 0 )( X ) a k where a is the angular measurement, X is the time period, and k is the number of time units on the circular measurement scale. What is the angular measurement of 6:15 a.m. (6.25a.m.)? (Remember to use a 24hr clock…) What is the angular measurement of February 14th? (Remember to use total days…) (360 0 )(6.25hr ) a 93.750 24hrs (360 0 )(45th day) a 44.38 0 365 days To analyze directional data they must first be transformed into rectangular polar coordinates. • First, we specify a ‘unit circle’ that has a radius of 1. • The polar location is then defined as the angular measurement and its intersection with the unit circle. • The cosine and sine functions are then used to place this location (based on the angle and unit distance) into a standardized Cartesian space. x y sin a r r cos 30 0.50 sin 30 0.87 cos a cos 60 0.50 sin 60 0.87 Note that the coordinates of opposite angles are identical. Also note that the x and y axes are opposite of the typical Cartesian plane. Death Valley Moving Rocks, vectors (degrees) Alsore, Chile Mean Angle (Azimuth) The mean angle can not simply be the sum of the angles divided by the sample size, because the mean angle of 359º and 1º (north) would be 180º (south)! Therefore we use the following equations: n Y sin i 1 n a n X cos i 1 a n r X 2 Y 2 cos a X r sin a Y r sin a cos a r arctan where X and Y are the rectangular coordinates of the mean angle, and r is the mean vector. Determining the Quadrant • Sin +, Cos + : the mean angle is computed directly. • Sin +, Cos - : the mean angle = 180 – θr. • Sin -, Cos - : the mean angle = 180 + θr. • Sin -, Cos + : the mean angle = 360 - θr. Rocks Vectors 341 330 301 299 9 7 359 334 353 15 27 28 25 23 350 30 26 22 8 356 ∑ Sin (Azimuth) -0.32557 -0.50000 -0.85717 -0.87462 0.15643 0.12187 -0.01745 -0.43837 -0.12187 0.25882 0.45399 0.46947 0.42262 0.39073 -0.17365 0.50000 0.43837 0.37461 0.13917 -0.06976 0.34763 Cos (Azimuth) 0.94552 0.86603 0.51504 0.48481 0.98769 0.99255 0.99985 0.89879 0.99255 0.96593 0.89101 0.88295 0.90631 0.92050 0.98481 0.86603 0.89879 0.92718 0.99027 0.99756 17.91415 First take the sine and cosine of the angles (azimuths) and sum them. In Excel the formula is: =sin(radians(cell #)) and =cos(radians(cell #)). n 20 sin Y a 0.34763 cos 0.34763 0.01738 20 a 17.91415 X 17.91415 0.89571 20 r 0.017382 0895712 0.00030 0.80229 0.8959 cos a 0.89571 0.9998 0.8959 0.0194 arctan 1.11 0.9998 sin a 0.01738 0.0194 0.8959 Ignore the sign. Since the sine is + and the cosine is +, we read the answer directly, regardless of the sign. Therefore the angle that corresponds to cos(0.9998), sin(0.01941) is 1.110... essentially due north. The value of r is also a measure of angular dispersion, similar to the standard deviation with a few exceptions: • Unlike the standard deviation it ranges from 0 – 1. • A value of 0 means uniform dispersion. • A value of 1 means complete concentration in one direction. r X Y 2 2 Andean Villages Principal Street Azimuths (labeled slope direction, degrees) Note that these azimuths are not truly unidirectional but are bidirectional. For example, azimuth 330 has an opposite azimuth of 330-180=150. 2 E o Bimodal Data When angular data have opposite azimuths they are said to have diametrically bimodal circular distributions. The mean angle of diametrically bimodal data is orthogonal (at right angles) to the true mean angle and a poor representation of the data. One method of dealing with diametrically bimodal circular data is to use a procedure called angle doubling. Data must be perfectly diametrical for this procedure to work. Angle Doubling Procedure: • Each angle (ai) is doubled (e.g. 34 x 2 = 68). • If the doubled angle is < 360º then it is recorded as 2ai. • If the doubled angle is ≥ 360º then 360 is subtracted from it and the results recorded as 2ai. • Then proceed as normal in calculating the mean angle. Andean Village Subset (n=30) Principal Azimuths (diametrically bimodal data) 356(2) – 360 = 352 Village Azimuth (ai) 2ai Sin(2ai) Cos(2ai) 356 359 11 350 334 357 358 358 19 7 6 6 334 21 348 176 179 191 170 154 177 178 178 199 187 186 186 154 201 168 352 358 22 340 308 354 356 356 38 14 12 12 308 42 336 352 358 22 340 308 354 356 356 38 14 12 12 308 42 336 ∑ -0.1392 -0.0349 0.3746 -0.3420 -0.7880 -0.1045 -0.0698 -0.0698 0.6157 0.2419 0.2079 0.2079 -0.7880 0.6691 -0.4067 -0.1392 -0.0349 0.3746 -0.3420 -0.7880 -0.1045 -0.0698 -0.0698 0.6157 0.2419 0.2079 0.2079 -0.7880 0.6691 -0.4067 -0.8515 0.9903 0.9994 0.9272 0.9397 0.6157 0.9945 0.9976 0.9976 0.7880 0.9703 0.9781 0.9781 0.6157 0.7431 0.9135 0.9903 0.9994 0.9272 0.9397 0.6157 0.9945 0.9976 0.9976 0.7880 0.9703 0.9781 0.9781 0.6157 0.7431 0.9135 26.8976 Y 0.8515 0.0284 30 X 26.8976 0.8966 30 r (0.0284) 2 (0.8966) 2 0.8047 0.8970 cos 2ai 0.8966 0.9996 0.8970 sin 2ai 0.0284 0.0317 0.8970 0.0317 arctan 0.0005 0 . 9996 The mean angle is therefore 360-0.0005 or effectively 360º. Testing the Significance of the Directional Mean Directional statistics are more sensitive to small sample sizes and it is important to test the directional mean for significance… not something typically done with conventional measures. Rayleigh z Test We can use the Rayleigh z test to test the null hypothesis that there is no sample mean direction: H0: There is no sample mean direction. Ha: There is a sample mean direction. We determine the Rayleigh z statistic using the equation: z nr 2 where n is the sample size and r is taken from the mean angle equation. The Rayleigh’s test has a few very important assumptions: • The data are not diametrically bidirectional. • The data are unimodal, meaning there are not more than one clustering of points around the circle. So for the Andean village azimuth which are diametrically bidirectional we must use the data from the angle doubling procedure. From the Andean Village example: n 30 r 0.897 z (30)(0.897 2 ) 24.14 zCritical 2.971 Since 24.14 > 2.971 reject Ho. There is a mean direction of 360 (or 0) degrees in the principal azimuths of the Andean villages (Rayleigh z24.14, p < 0.001). Hypothesis Testing: Uniformity We can also test the hypothesis that the azimuths are not uniformly distributed (occur equally around the compass). Ho: The distribution of slope aspects is not significantly different than uniform around the compass. Ha: The distribution of slope aspects is significantly different than uniform around the compass. Observed Distribution Uniform Distribution To test this hypothesis we can use the ratio of the observed slope aspects to the expected (uniform) slope aspects and χ2. • If the sample size is reasonable large (> 30) this technique works well. • If possible, group the data such that no group has less than 4 observations. • Sometimes grouping in this way is not possible. First we need to calculate the expected values. Since we are using a uniform distribution, the expected values are: ˆf Sample Size i Categories and all expected values for each group will be the same. We then test the X2 statistic, which is calculated as: 2 ˆ ( fi fi ) 2 fˆ i The azimuths can be categorized by sector (NE, SW, etc...) based on these compass directions: N NE E SE S SW W NW 337.5 – 22.5 22.5 – 67.5 67.5 – 112.5 112.5 – 157.5 157.5 – 202.5 202.5 – 247.5 247.5 – 292.5 292.5 – 337.5 ˆf 19 2.375 i 8 fi ˆfi East 5 2.375 North 2 2.375 Northeast 6 2.375 Northwest 1 2.375 South 1 2.375 Southeast 4 2.375 Southwest 0 2.375 West 0 2.375 n= 19 (5 2.375) 2 (2 2.375) 2 (6 2.375) 2 (1 2.375) 2 (1 2.375) 2 (4 2.375) 2 (0 2.375) 2 (0 2.375) 2 2.375 2.375 2.375 2.375 2.375 2.375 2.375 2.375 2 2 2.90 0.06 5.53 0.80 0.80 1.11 2.375 2.375 2 15.95 df k 1 8 1 7 2 critical 14.067 Since 15.95 > 14.067, reject Ho. The distribution of slope aspects is significantly different than a uniform distribution around the compass (X215.95, 0.05 > p > 0.025). Two-Sample Hypothesis Testing We can test the hypothesis that two sets of azimuths are not significantly different in a procedure similar to the MannWhitney U test called Watson’s U2 test. Ho: The two groups of principal azimuths are not significantly different. Ha: The two groups of principal azimuths are significantly different. Watson’s U2 test equation: 2 d n1 n2 k 2 2 U 2 d k N N where N n1 n2 • First the data are ranked across groups • n1 and n2 are the group sample sizes. • The expected frequency (i/n1 and j/n2) are then calculated. Group A Group B i 1 2 3 4 5 6 Group A a1i (deg) 55 60 70 80 95 120 i/n1 0.1667 0.3333 0.5000 0.5000 0.6667 0.6667 0.8333 0.8333 0.8333 1.0000 1.0000 1.0000 1.0000 1.0000 n1 6 n2 8 j Group B a2i (deg) 1 75 2 90 3 4 100 110 5 6 7 8 135 140 150 160 d k U 2 0.0907 2 U Critical 0.1964 dk = i/n1 - j/n2 0.1667 0.3333 0.5000 0.3750 0.5417 0.4167 0.5833 0.4583 0.3333 0.5000 0.3750 0.2500 0.1250 0.0000 d 4.9583 (6)(8) (4.9583) 2 U 2.1265 14 2 14 2 j/n2 0.0000 0.0000 0.0000 0.1250 0.1250 0.2500 0.2500 0.3750 0.5000 0.5000 0.6250 0.7500 0.8750 1.0000 2 k d2 k 0.0278 0.1111 0.2500 0.1406 0.2934 0.1736 0.3402 0.2100 0.1111 0.2500 0.1406 0.0625 0.0156 0.0000 2.1265 2 dk n1n2 2 U 2 d k N N 2 Since 0.0907 < 0.1964 accept Ho. The two groups of principal azimuths of Andean villages are not significantly different (U20.0907, 0.50 > p > 0.20). Serial Randomness of Nominal Data on a Circle Used to test the hypothesis that the occurrence of categorical data around a circle is random. Ho: The distribution of data is not significantly different than random. Ha: The distribution of data is significantly different than random. • Can only be used for 2 categories of data. • Uses the Runs test statistic for sample sizes of 20 or less. • Use the normal approximation for larger sample sizes. Normal Approximation Equation 2n1n2 N u '1 N Z 2n1n2 (2n1n2 N N 2 ( N 1) where Z is the normal approximation, u’ is the total number of runs in the data set, n1 and n2 are the group sample sizes, and N is the total sample size (n1 + n2). n = 42 360º/42 = 8.5º Towns (n1) = 17 Villages (n2) = 25 Runs = 14 This could also simply be represented as dots along a line, similar to the Runs test for sequential data. n1 17 n2 25 u ' 14 Z Z 2(17)(25) 42 42 2(17)(25) 2(17)(25) 42 422 (42 1) 14 1 892 42 (850)(808) 72324 15 6.238 2.02 3.082 p 0.0217 Since the probability is less than 0.05, we reject Ho. If a one-tailed test is used, u’ values less than the lower of the u’ critical values from the Runs table of circular data signify that the non-randomness is due to clustering. If the u’ values are greater than the higher of the u’ critical values from the Runs table of circular data it signifies that the nonrandomness is due to uniformity. If the normal approximation is used, negative Z values signify that the non-randomness is due to clustering, while positive values signify uniformity. Therefore in our example: The distribution of towns and villages is significantly different than random due to clustering (Z-2.02, p = 0.0217).