Directional (Circular) Statistics

advertisement
Directional (Circular) Statistics
Directional or circular distributions are those that have no true
zero and any designation of high or low values is arbitrary:
• Compass direction
• Hours of the day
• Months of the year
Time can be converted to an angular measurement using the
equation:
(360 0 )( X )
a
k
where a is the angular measurement, X is the time period, and
k is the number of time units on the circular measurement scale.
What is the angular measurement
of 6:15 a.m. (6.25a.m.)?
(Remember to use a 24hr clock…)
What is the angular measurement
of February 14th?
(Remember to use total days…)
(360 0 )(6.25hr )
a
 93.750
24hrs
(360 0 )(45th day)
a
 44.38 0
365 days
To analyze directional data they must first be transformed into rectangular polar
coordinates.
• First, we specify a ‘unit circle’ that has a radius of 1.
• The polar location is then defined as the angular measurement and its
intersection with the unit circle.
• The cosine and sine functions are then used to place this location
(based on the angle and unit distance) into a standardized Cartesian
space.
x
y
sin a 
r
r
cos 30  0.50
sin 30  0.87
cos a 
cos 60  0.50
sin 60  0.87
Note that the coordinates of opposite
angles are identical. Also note that the
x and y axes are opposite of the typical
Cartesian plane.
Death Valley Moving
Rocks, vectors (degrees)
Alsore, Chile
Mean Angle (Azimuth)
The mean angle can not simply be the sum of the angles divided
by the sample size, because the mean angle of 359º and 1º (north)
would be 180º (south)! Therefore we use the following equations:
n
Y
 sin
i 1
n
a
n
X
 cos
i 1
a
n
r  X 2 Y 2
cos a 
X
r
sin a 
Y
r
 sin a 

cos
a


 r  arctan 
where X and Y are the rectangular coordinates of the mean
angle, and r is the mean vector.
Determining the Quadrant
• Sin +, Cos + : the mean angle is computed directly.
• Sin +, Cos - : the mean angle = 180 – θr.
• Sin -, Cos - : the mean angle = 180 + θr.
• Sin -, Cos + : the mean angle = 360 - θr.
Rocks Vectors
341
330
301
299
9
7
359
334
353
15
27
28
25
23
350
30
26
22
8
356
∑
Sin (Azimuth)
-0.32557
-0.50000
-0.85717
-0.87462
0.15643
0.12187
-0.01745
-0.43837
-0.12187
0.25882
0.45399
0.46947
0.42262
0.39073
-0.17365
0.50000
0.43837
0.37461
0.13917
-0.06976
0.34763
Cos (Azimuth)
0.94552
0.86603
0.51504
0.48481
0.98769
0.99255
0.99985
0.89879
0.99255
0.96593
0.89101
0.88295
0.90631
0.92050
0.98481
0.86603
0.89879
0.92718
0.99027
0.99756
17.91415
First take the sine and cosine of the angles (azimuths) and sum them.
In Excel the formula is: =sin(radians(cell #)) and =cos(radians(cell #)).
n  20
 sin
Y
a
 0.34763
 cos
0.34763
 0.01738
20
a
17.91415
X
17.91415
 0.89571
20
r  0.017382  0895712  0.00030  0.80229  0.8959
cos a 
0.89571
 0.9998
0.8959
 0.0194 
arctan 
  1.11
 0.9998 
sin a 
0.01738
 0.0194
0.8959
Ignore the sign.
Since the sine is + and the cosine is +, we read the answer directly,
regardless of the sign. Therefore the angle that corresponds to
cos(0.9998), sin(0.01941) is 1.110... essentially due north.
The value of r is also a measure of angular dispersion, similar
to the standard deviation with a few exceptions:
• Unlike the standard deviation it ranges from 0 – 1.
• A value of 0 means uniform dispersion.
• A value of 1 means complete concentration in one direction.
r  X Y
2
2
Andean Villages
Principal Street Azimuths
(labeled slope direction, degrees)
Note that these azimuths
are not truly unidirectional
but are bidirectional.
For example, azimuth 330
has an opposite azimuth
of 330-180=150.
2
E
o
Bimodal Data
When angular data have opposite azimuths they are said to have
diametrically bimodal circular distributions.
The mean angle of diametrically bimodal data is orthogonal (at right
angles) to the true mean angle and a poor representation of the
data.
One method of dealing with diametrically bimodal circular data
is to use a procedure called angle doubling.
Data must be perfectly diametrical for this procedure to work.
Angle Doubling Procedure:
• Each angle (ai) is doubled (e.g. 34 x 2 = 68).
• If the doubled angle is < 360º then it is recorded as 2ai.
• If the doubled angle is ≥ 360º then 360 is subtracted from it
and the results recorded as 2ai.
• Then proceed as normal in calculating the mean angle.
Andean Village Subset
(n=30)
Principal Azimuths
(diametrically bimodal data)
356(2) – 360 = 352
Village Azimuth (ai)
2ai
Sin(2ai)
Cos(2ai)
356
359
11
350
334
357
358
358
19
7
6
6
334
21
348
176
179
191
170
154
177
178
178
199
187
186
186
154
201
168
352
358
22
340
308
354
356
356
38
14
12
12
308
42
336
352
358
22
340
308
354
356
356
38
14
12
12
308
42
336
∑
-0.1392
-0.0349
0.3746
-0.3420
-0.7880
-0.1045
-0.0698
-0.0698
0.6157
0.2419
0.2079
0.2079
-0.7880
0.6691
-0.4067
-0.1392
-0.0349
0.3746
-0.3420
-0.7880
-0.1045
-0.0698
-0.0698
0.6157
0.2419
0.2079
0.2079
-0.7880
0.6691
-0.4067
-0.8515
0.9903
0.9994
0.9272
0.9397
0.6157
0.9945
0.9976
0.9976
0.7880
0.9703
0.9781
0.9781
0.6157
0.7431
0.9135
0.9903
0.9994
0.9272
0.9397
0.6157
0.9945
0.9976
0.9976
0.7880
0.9703
0.9781
0.9781
0.6157
0.7431
0.9135
26.8976
Y
 0.8515
 0.0284
30
X
26.8976
 0.8966
30
r  (0.0284) 2  (0.8966) 2  0.8047  0.8970
cos 2ai 
0.8966
 0.9996
0.8970
sin 2ai 
 0.0284
 0.0317
0.8970
  0.0317 
arctan 
  0.0005
0
.
9996


The mean angle is therefore 360-0.0005 or effectively 360º.
Testing the Significance of the Directional Mean
Directional statistics are more sensitive to small sample sizes and
it is important to test the directional mean for significance… not
something typically done with conventional measures.
Rayleigh z Test
We can use the Rayleigh z test to test the null hypothesis that
there is no sample mean direction:
H0: There is no sample mean direction.
Ha: There is a sample mean direction.
We determine the Rayleigh z statistic using the equation:
z  nr
2
where n is the sample size and r is taken from the mean angle
equation.
The Rayleigh’s test has a few very important assumptions:
• The data are not diametrically bidirectional.
• The data are unimodal, meaning there are not more than one
clustering of points around the circle.
So for the Andean village azimuth which are diametrically
bidirectional we must use the data from the angle doubling
procedure.
From the Andean Village
example:
n  30
r  0.897
z  (30)(0.897 2 )  24.14
zCritical  2.971
Since 24.14 > 2.971 reject Ho.
There is a mean direction of 360 (or 0) degrees in the principal
azimuths of the Andean villages (Rayleigh z24.14, p < 0.001).
Hypothesis Testing: Uniformity
We can also test the hypothesis that the azimuths are not
uniformly distributed (occur equally around the compass).
Ho: The distribution of slope aspects is not significantly
different than uniform around the compass.
Ha: The distribution of slope aspects is significantly different
than uniform around the compass.
Observed Distribution
Uniform Distribution
To test this hypothesis we can use the ratio of the observed
slope aspects to the expected (uniform) slope aspects and χ2.
• If the sample size is reasonable large (> 30) this technique works
well.
• If possible, group the data such that no group has less than 4
observations.
• Sometimes grouping in this way is not possible.
First we need to calculate the expected values. Since we are
using a uniform distribution, the expected values are:
ˆf  Sample Size
i
Categories
and all expected values for each group will be the same.
We then test the X2 statistic, which is calculated as:
2
ˆ
( fi  fi )
2
 
fˆ
i
The azimuths can be categorized
by sector (NE, SW, etc...) based
on these compass directions:
N
NE
E
SE
S
SW
W
NW
337.5 – 22.5
22.5 – 67.5
67.5 – 112.5
112.5 – 157.5
157.5 – 202.5
202.5 – 247.5
247.5 – 292.5
292.5 – 337.5
ˆf  19  2.375
i
8
fi
ˆfi
East
5
2.375
North
2
2.375
Northeast
6
2.375
Northwest
1
2.375
South
1
2.375
Southeast
4
2.375
Southwest
0
2.375
West
0
2.375
n=
19
(5  2.375) 2 (2  2.375) 2 (6  2.375) 2 (1  2.375) 2 (1  2.375) 2 (4  2.375) 2 (0  2.375) 2 (0  2.375) 2
 







2.375
2.375
2.375
2.375
2.375
2.375
2.375
2.375
2
 2  2.90  0.06  5.53  0.80  0.80  1.11  2.375  2.375
 2  15.95
df  k  1  8  1  7
2
 critical
 14.067
Since 15.95 > 14.067, reject Ho.
The distribution of slope aspects is significantly different than a
uniform distribution around the compass (X215.95, 0.05 > p > 0.025).
Two-Sample Hypothesis Testing
We can test the hypothesis that two sets of azimuths are not
significantly different in a procedure similar to the MannWhitney U test called Watson’s U2 test.
Ho: The two groups of principal azimuths are not significantly
different.
Ha: The two groups of principal azimuths are significantly
different.
Watson’s U2 test equation:
2




d
n1 n2

k
2
2

U  2  d k 
N 
N 


where
N  n1  n2
• First the data are ranked across groups
• n1 and n2 are the group sample sizes.
• The expected frequency (i/n1 and j/n2) are then calculated.
Group A
Group B
i
1
2
3
4
5
6
Group A
a1i (deg)
55
60
70
80
95
120
i/n1
0.1667
0.3333
0.5000
0.5000
0.6667
0.6667
0.8333
0.8333
0.8333
1.0000
1.0000
1.0000
1.0000
1.0000
n1  6
n2  8
j
Group B
a2i (deg)
1
75
2
90
3
4
100
110
5
6
7
8
135
140
150
160
d
k
U 2  0.0907
2
U Critical
 0.1964
dk = i/n1 - j/n2
0.1667
0.3333
0.5000
0.3750
0.5417
0.4167
0.5833
0.4583
0.3333
0.5000
0.3750
0.2500
0.1250
0.0000
d
 4.9583
(6)(8) 
(4.9583) 2 
U 
2.1265 

14 2 
14

2
j/n2
0.0000
0.0000
0.0000
0.1250
0.1250
0.2500
0.2500
0.3750
0.5000
0.5000
0.6250
0.7500
0.8750
1.0000
2
k
d2 k
0.0278
0.1111
0.2500
0.1406
0.2934
0.1736
0.3402
0.2100
0.1111
0.2500
0.1406
0.0625
0.0156
0.0000
 2.1265
2

dk  
n1n2 

2

U  2  d k 
N 
N 


2
Since 0.0907 < 0.1964 accept Ho.
The two groups of principal azimuths of Andean villages are not
significantly different (U20.0907, 0.50 > p > 0.20).
Serial Randomness of Nominal Data on a Circle
Used to test the hypothesis that the occurrence of categorical
data around a circle is random.
Ho: The distribution of data is not significantly different than
random.
Ha: The distribution of data is significantly different than random.
• Can only be used for 2 categories of data.
• Uses the Runs test statistic for sample sizes of 20 or less.
• Use the normal approximation for larger sample sizes.
Normal Approximation Equation
2n1n2  N
u '1 
N
Z
2n1n2 (2n1n2  N
N 2 ( N  1)
where Z is the normal approximation, u’ is the total number of runs
in the data set, n1 and n2 are the group sample sizes, and N is the
total sample size (n1 + n2).
n = 42
360º/42 = 8.5º
Towns (n1) = 17
Villages (n2) = 25
Runs = 14
This could also simply be represented as dots along a line, similar to the Runs test for
sequential data.
n1  17
n2  25
u '  14
Z
Z
2(17)(25)  42
42

2(17)(25) 2(17)(25)   42
422 (42  1)
14  1 
892
42
(850)(808)
72324
15 
 6.238
 2.02
3.082
p  0.0217
Since the probability is less than 0.05, we reject Ho.
If a one-tailed test is used, u’ values less than the lower of the u’
critical values from the Runs table of circular data signify that the
non-randomness is due to clustering.
If the u’ values are greater than the higher of the u’ critical values
from the Runs table of circular data it signifies that the nonrandomness is due to uniformity.
If the normal approximation is used, negative Z values signify that
the non-randomness is due to clustering, while positive values
signify uniformity.
Therefore in our example:
The distribution of towns and villages is significantly different
than random due to clustering (Z-2.02, p = 0.0217).
Download