Breaking records and breaking boards

advertisement
2
NED GLICK
[January
For this departmentwe hope to obtain brief papersdealing with the subjects
NOTES.
CLASSROOM
currentlytaughtin undergraduatemathematics.Occasionallywe shall also accept notes dealingwith
material commonly encountered in the first year of graduate study. Classroomnotes should be
relevant to some actual classroom situation that can be expected to exist at a good number of
institutions.
MATHEMATICAL
EDUCATION.There is now a growingandvery healthyconcernwith mathematicians'
role as teachers.This departmentwelcomes reportsof experimentsin novel methodsof teachingas
well as discussionsof all educationalaspects of our profession.
PROBLEMS.This departmentattractsmany readers and is an especially valuable feature of the
MONTHLY.It will continueto seek and publishinterestingelementaryand advancedproblemsin pure
and applied mathematics,both classicaland modern.
REVIEWS.Throughthe telegraphicreviews the MONTHLYwill continue to provide as complete a
coverage as possible of the current textbooks. At present the MONTHLYiS the only journal that
performsthis servicefor the mathematicalcommunity.Extendedreviewswill continueto appear,and
we hope that the numberof classroomreviews will increase.
R. P. BOAS,
Editor
BREAKINGRECORDSAND BREAKINGBOARDS
NED GLICK
Part I.
1.
2.
3.
4.
5.
6.
What a Statistician Can Do With a Minimum of
Probability or the Probability of a Minimum
Introduction
Weather Records
Tests of Randomness
Car Caravans in a One-Lane Tunnel
Sequential Strategy for Destructive Testing
Tolerance Limits for Failure Distributions
Part II. A Beginner's Guide to Record Breaking Mathematics
7. Persistence of Record Breaking and Divergence of the
Harmonic Series
8. Frequency of Record Breaking
9. Serial Numbers of Record Breaking Trials
10. Waiting Times Between Record Breaking Trials
11. The Record Value Sequence
12. Extreme Values and Extremal Processes
1. Introduction.When I take observations in chronological sequence, how often will the
outstandingrecordvalue be surpassed?For example,supposethat I registerthe annualtotal inchesof
precipitationor, to take a less gloomy statistic,the total hoursof brightsunshinein Vancouver,where
I live: what is the probabilitythat next year will be a new maximum?
Breakthroughsare less likely later than early in a sequence of observations.The firstobservation
necessarilymustbe a "recordhigh."But, priorto observingany values,I knowthat the second of two
numbersin randomsequencehas equal probabilityof being smalleror largerthan the first.Hence the
probabilityis exactly50%that a second, independentobservationwill be a new recordhigh surpassing
the initial record, assumingthat there cannot be an exact tie (if measurementis arbitrarilyprecise).
1978]
BREAKING RECORDS AND BREAKING BOARDS
3
Fromthe same perspective,there is probability1/3 that a thirdtrialwill be a new maximum,since the
last of three repeatedobservationsis equally likely to be smallest,middle, or largest.... Similarly,all
10 ranksare equally likely for the tenth observation;so maximumrankfor the tenth observationhas
probability1/10.
The theoretical expected or average number of record highs in a chronologicalsequence of n
independentobservationsis the sum of these probabilities:1 + (1/2) + (1/3)+..- + (1/n).
This resultsurprisesmy acquaintanceswho have no backgroundin probabilitytheory.I have here
the beginningof a conversationalploy to hold a person'sinterestwhen she or he asks whatwork I do
and I say, "I am a statistician."Otherwisethis reply may ruin a fine conversation,as in the dinner
scene depicted by WilliamKruskal[221.
My ploy proceedsto verify predictionsfrom the simple probabilitymodel with some real weather
records.It happensthat these weatherrecordsexcellentlyillustratethe pathosand problemsof work
with statisticaldata.
It is just as interestingto note that the simple model does not fit record breakingin athletic
competitions.Race times, jump heights, and throw distanceshave improvedover several decades,
while rainfallfluctuationsfromyear to year are "random."In fact, the frequenciesof recordhighsand
lows can be used to infer whether observationsindicate linear trend or randomsequence.
After the weatherand sports, I talk about traffic.Curiously,the simple model of randomrecord
values says something about how cars tend to bunch together behind a slow vehicle.
The same probabilitymodel applies in yet another context: a sequential strategy to find the
weakest item in a sample of boardsor beams, and hence to establish"tolerancelimits"for lumber
strengths.My interest in recordhighs and lows actuallybegan in discussionsat the WesternForest
ProductsLaboratoryof the CanadianForestryService.Breakinga random,but usuallysmall,fraction
of the availablebeamscan accomplishthe same purposeas 100%destructivelaboratorytesting.This
conservationof materialillustratesthe economic spirit of experimentaldesign.
So one example,the frequencyof recordhighsor lows, ties togetherseveral"applied"aspectsof a
statistician'slivelihood.More surprising,the intuitiveidea that any recordcan be beaten also leads to
grows without bound, becoming
mathematicalproof that the harmonicsum 1 + (1/2)+ (1/3)+
finite
number.
than
any
bigger
Harmonicdivergenceis the simplestof many"well known"limit theoremsand paradoxesrelated
to record breaking. The mathematicalsections comprisingthe second part of my paper review
primarilythose results in the theory of record values which do not depend on the particular
distributionof the basic sequence of observations.This mathematicsis mostly at the level of an
undergraduatediscrete probabilitycourse in the spirit of WilliamFeller's famous text [121.1 avoid
differentiationand integrationas much as possible.
2. Weatherrecords. Weatherrecordsin Canadaare kept by the MeteorologicalBranchof the
federalDepartmentof Transport.Vancouver'sweatherrecords,however,are largelyproductsof "the
colourfulcareersof the two Shearmanbrothers":T. S. H. Shearman,who "wasofficiallyappointedby
the governmentas weatherobserverfor the city of Vancouver"in April, 1905, and E. B. Shearman,
who replacedhis brotherfrom 1915to 1948,for whichservicehe receivedthe BritishEmpireMedal.
This historyI found in a Departmentof Transport1964Annual MeteorologicalSummary[251,from
which most of my figuresare taken.
Monthly,as well as annualinches of precipitationand hoursof sunshineare shown in Data Set 1
and Data Set 2, respectively.Looking down the Februarycolumn,for example, I find 5 recordhigh
yearsin the precipitationdata(1900,1901,1902,1918,1961)and5 in the sunshinedata. Eachsequence
includes65 observations;so 1 + (1/2)+ (1/3)+ .. + (1/65)= 4.76 is the theoreticalexpectednumberof
record highs.
The 12 monthly precipitationsequences actually give an empiricalaverage of 4.17 highs per
sequence. Record lows have the same distributionas highs, if the case of absolutelynil precipitation
4
[January
NED GLICK
DATASET1.
Monthly & Annual Total Precipitation (Rain + One-Tenth Snow) in Vancouver, B.C., in Inches
Year Jan.
1900
1901
1902
1903
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
7.24
11.28
6.68
8.23
9.66
9.32
7.60
6.20
11.19
6.11
8.47
9.62
10.56
7.13
5.96
9.33
11.05
7.57
8.92
9.39
3.15
8.73
8.26
12.16
7.62
9.08
9.04
3.05
2.75
11.24
9.18
9.57
11.90
20.65
9.37
2.42
6.59
11.91
5.65
8.04
3.56
4.25
7.85
9.33
11.30
10.68
3.76
.84
6.48
11.10
8.23
14.08
Feb.
Mar. Apr.
5.95 10.29 4.51
6.31 3.04 5.29
10.17 7.45 3.11
2.60 6.88 3.78
6.03 2.37 1.04
8.30 2.39 4.13
6.31 7.14 2.61
8.15 4.31 1.23
5.01 2.91 3.60
3.37 3.05 1.96
.89 3.92
6.25
4.38 5.38 2.53
4.87 3.33 3.28
4.42 4.18 3.04
7.40 14.55 4.07
5.87 5.61 8.20
10.50 7.48 1.70
7.35 6.61 4.47
1.21 5.20 2.51
6.66 2.53 3.62
4.75 3.44 2.63
4.06 3.86 2.14
8.86 1.16 3.84
6.01 4.24 2.44
6.70 2.48 2.58
4.73 6.80 1.88
1.87 7.01 4.29
1.64 4.47 4.81
9.42 3.33 4.72
4.95 6.35 4.57
6.65 9.15 4.84
6.32 6.43 .53
3.02 5.57 1.18
3.75
8.41 2.28
4.37 6.12 3.31
8.62 3.64 7.16
4.68 4.65 2.93
5.94 3.83 1.71
8.11 7.39 4.57
6.24 3.82 2.59
2.82 4.25 3.88
4.69 3.94 4.16
4.63 3.41 4.15
7.45 8.54 4.23
9.36 8.79 7.54
7.02 6.52 5.62
10.31 2.67 3.59
7.58 5.05 2.22
10.07 9.42 5.01
10.28 6.40 2.82
5.55 5.99 2.72
4.79 4.93 2.38
May
June July
4.20 5.42
4.38 5.01
4.40 1.97
3.68 3.56
3.58 3.04
1.44 1.43
4.11 1.86
3.76 1.69
2.15 1.98
5.39 2.09
2.35 2.28
4.33 3.81
.74 3.58
3.42 1.07
1.41 1.34
1.69 5.40
1.15 1.00
3.60 1.02
1.94 3.06
2.52 3.64
.17
2.46
2.84 2.07
.91
.31
.38
2.20
4.17
.78
5.12 1.16
2.22 1.93
1.25 3.24
2.86 2.18
1.23 5.59
1.34 2.08
4.33 2.03
.69
3.44
.55 1.06
5.23 3.13
3.14 6.14
.78
1.86
3.40 3.03
.30
2.85
5.73 2.19
1.89 5.28
3.28 1.20
1.90
.95
.94
2.64
.43 4.87
1.74 2.08
6.05 11.82
1.73 1.76
2.61 1.48
.37
3.50
2.23 3.92
2.31 2.26
1.05
83
2.37
1.12
.45
1.70
1.59
2.45
.24
.92
1.54
2.02
.42
.91
5.25
.48
2.29
.15
.67
.32
.02
.52
.71
.74
.36
.94
.47
1.41
.08
.44
5.32
1.76
1.86
1.79
1.65
.47
.66
2.34
1.52
.48
3.36
1.10
.49
.91
1.63
2.34
2.16
2.75
2.15
.01
.40
1.29
Aug.
3.60
.22
1.15
1.07
.83
1.36
1.15
1.43
1.38
1.23
5.86
.85
.75
.36
.58
.93
4.59
1.15
2.91
2.84
2.01
.73
1.88
2.36
2.10
3.74
.20
1.50
.07
.61
2.01
1.12
1.24
1.36
2.11
3.45
.74
.68
1.70
2.12
.47
2.12
.92
1.27
.78
.49
4.03
1.46
2.93
.90
1.27
1.83
Sept.
1.61
2.65
3.39
8.35
8.87
4.51
1.46
2.23
2.47
4.41
2.84
3.89
6.86
.80
1.28
3.30
.30
1.16
10.37
5.03
5.76
2.97
5.81
.44
3.26
3.07
1.35
1.77
2.65
7.14
2.62
5.89
2.77
2.57
3.51
1.84
1.68
3.20
2.15
8.26
.79
1.82
4.86
3.21
1.94
2.18
2.83
1.45
1.65
3.64
1.08
4.76
Oct.
Nov.
Dec.
Annual
9.20
5.20
4.72
5.72
7.60
1.76
6.68
7.06
9.04
2.24
4.64
6.19
6.37
8.83
2.16
3.49
7.56
3.14
8.34
10.08
3.26
2.56
6.56
3.00
6.37
7.22
7.38
3.57
7.50
5.22
5.66
7.94
4.99
6.70
4.21
8.14
5.74
6.72
10.85
9.86
6.20
7.16
5.87
7.10
5.91
10.29
4.69
6.83
10.24
5.66
2.13
5.26
10.00
14.06
10.33
11.36
8.25
13.23
13.69
15.66
10.62
12.68
9.21
10.08
10.18
5.41
6.37
5.23
7.02
11.25
7.14
8.99
2.63
6.13
5.87
4.05
7.80
8.39
5.36
2.54
3.00
8.34
10.17
5.45
9.56
4.65
1.84
11.01
6.44
12.35
4.91
6.93
6.11
3.29
8.97
10.07
6.19
5.22
14.57
11.89
5.01
6.71
2.34
10.44
9.22
8.09
9.55
4.21
7.33
8.02
8.41
4.28
8.79
8.82
8.80
3.95
2.84
10.36
5.71
11.72
8.07
9.74
11.01
5.56
10.35
15.88
8.35
13.05
8.99
6.92
5.34
8.58
5.22
11.92
7.37
12.85
12.27
8.46
10.63
10.94
13.53
11.78
10.47
9.15
9.59
9.16
3.76
6.45
8.40
13.32
9.59
7.62
10.50
6.17
7.97
11.28
72.29
66.36
65.29
60.56
59.05
57.59
62.61
58.45
59.38
52.27
57.05
57.03
53.78
49.93
56.08
61.25
62.71
57.21
63.28
61.18
40.63
52.49
52.52
51.07
53.21
59.05
46.46
37.83
43.78
67.60
66.39
64.22
58.49
62.23
55.48
66.97
50.28
66.89
60.47
65.41
48.20
46.17
47.76
62.14
67.14
67.50
66.07
51.18
67.55
57.56
43.83
65.61
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
BREAKING RECORDS AND BREAKING BOARDS
19781
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
10.00 8.39
5.09 4.17
7.10 6.47
3.52 3.79
13.48 6.75
8.51 6.83
7.43 7.04
13.28 15.26
7.57 2.37
1.76 7.53
11.62 3.82
9.54 10.67
9.28 4.53
2.58
7.41
7.72
6.96
3.51
8.97
5.49
6.04
4.35
4.29
6.44
2.30
4.87
3.92
4.32
1.12
2.87
3.60
3.66
3.13
2.98
5.02
4.09
3.27
2.48
1.75
2.13
3.22
.84
1.72
1.33
2.78
5.64
3.96
2.87
1.81
2.75
2.89
2.89
2.79
3.03
6.57
2.99
1.51
3.94
2.05
1.42
1.42
2.24
2.58
0.62
2.18
2.11
2.97
.89
2.81
Nil
.97
.02
1.34
1.20
3.08
3.56
0.51
3.36
4.35
.31
2.52
1.84
2.35
.80
4.15
3.56
5.35
.83
2.33
2.57
1.66
4.45
1.87
5.65
1.72
3.10
7.03
2.08
1.94
3.12
1.86
7.41
0.67
3.17
3.15
7.08
13.69
3.88
5.93
4.99
11.00
8.82
5.77
8.71
3.38
8.67
7.43
5
16.10
13.59
13.86
4.59
9.03
8.09
7.06
7.70
11.55
9.33
10.17
6.68
9.47
9.39
7.14
13.71
9.24
9.44
8.24
7.13
10.04
12.49
12.92
6.71
7.43
15.34
69.36
60.20
70.14
45.93
60.03
64.81
62.22
76.34
63.08
58.45
64.04
55.03
65.93
53
54
55
56
57
58
59
60
61
62
63
64
65
DATASET2.
Monthly and Annual Total Hours of Bright Sunshine in Vancouver, B.C.
Year Jan.
Feb.
Mar. Apr.
May
June
July
Aug.
Sept.
Oct.
Nov.
Dec.
Annual
1909 55.3 56.4 143.7 252.4 230.3
1910 45.5 91.5 117.7 181.2 276.5
14.4 84.8 154.0 248.2 174.9
1911
1912 44.2 81.4 233.8 113.8 215.4
1913 43.3 97.1 125.6 149.5 178.2
1914 38.7 46.4 104.1 155.8 270.2
1915 57.7 44.1 131.8 203.0 163.7
1916 65.9 84.8 86.4 141.5 216.9
1917 53.9 56.7 153.5 90.4 220.6
1918 45.3 95.8 103.5 257.5 253.0
1919 50.6 58.2 128.2 147.2 257.5
1920 44.8 148.4 108.0 196.1 249.0
1921 31.6 85.9 137.2 180.1 275.0
1922 77.4 98.5 116.5 143.7 243.2
1923 52.3 75.2 163.4 194.7 181.4
1924 24.6 59.2 157.8 168.0 250.3
1925 49.4 49.7 137.3 168.7 261.0
19Z6 27.4 56.1 187.7 220.0 186.4
1927 39.9 96.3 128.8 164.5 199.9
1928 42.6 102.7 116.2 168.4 278.4
1929 50.5 109.7 119.3 169.3 249.0
1930 100.7 85.8 169.5 142.7 254.2
1931 28.5 80.2 109.8 215.2 274.7
1932 65.4 97.0 94.6 147.4 246.3
1933 36.8 86.1 117.8 223.4 146.1
1934 56.7 118.8 136.5 237.5 236.2
1935 44.8 94.3 84.8 246.9 277.5
1936 54.6 70.5 121.6 168.5 174.0
1937 87.1 69.7 93.7 79.0 221.1
1938 30.7 78.0 110.5 175.9 313.1
1939 20.8 75.9 116.2 166.5 206.7
1940 42.9 63.4 109.0 167.2 242.6
1941 38.7 93.0 161.4 184.6 191.8
1942 51.2 75.7 101.6 120.6 140.0
1943 52.8 104.3 107.7 149.5 177.1
1944 50.9 80.7 155.7 121.4 213.0
1945 51.9 75.5 77.6 129.9 200.2
1946 38.2 42.0 95.2 88.4 287.1
251.8
186.5
217.7
229.0
186.6
250.5
266.7
222.9
188.5
318.8
250.1
206.5
137.4
257.0
228.6
238.0
254.3
288.6
219.6
195.7
187.9
219.0
187.9
297.0
198.3
301.0
194.7
212.9
201.6
292.1
143.9
329.2
164.2
192.7
234.7
201.2
191.5
135.7
224.4
312.8
272.7
198.1
270.2
313.5
261.7
145.5
358.7
297.4
341.8
308.7
317.2
267.5
289.1
273.9
327.7
300.3
305.1
285.1
322.2
320.6
381.2
195.2
336.3
236.2
247.3
313.3
292.6
314.6
278.7
236.5
326.0
236.4
251.5
259.1
283.3
232.9
262.8
224.1
225.1
186.6
228.0
280.7
221.9
284.4
348.2
232.4
265.5
308.6
221.1
193.2
291.7
238.0
240.7
236.9
178.4
258.1
304.9
305.6
334.1
237.7
302.3
270.2
257.1
295.0
203.0
253.9
320.6
266.1
231.2
291.0
203.4
179.9
252.1
267.7
294.9
181.2
142.2
185.9
173.2
79.4
141.3
190.5
153.8
236.2
202.9
138.2
172.1
163.4
226.4
217.8
205.2
221.7
124.2
199.2
225.2
192.0
118.5
235.3
123.7
155.3
213.2
181.4
164.6
161.6
193.6
180.1
119.2
175.1
201.0
167.0
154.0
144.9
99.2
103.6
140.0
110.7
110.6
101.6
84.6
150.9
104.5
77.1
122.7
91.4
123.7
106.7
142.8
118.7
116.0
93.8
78.3
94.3
113.7
125.9
124.9
115.1
112.4
126.5
109.0
130.1
121.2
132.7
108.7
73.9
80.1
116.3
126.0
108.0
124.0
122.6
58.1
49.9
41.6
19.4
62.1
39.0
70.2
84.7
50.9
48.4
60.7
77.8
42.9
71.7
49.9
54.3
59.7
60.5
28.8
45.0
61.7
54.1
96.8
37.8
42.9
34.2
59.1
52.3
37.1
68.4
36.8
48.6
71.2
61.3
42.1
38.5
33.5
66.5
51.4
31.0
48.3
26.0
27.6
67.1
36.4
23.5
23.1
58.4
73.5
20.0
57.5
23.3
30.0
62.4
10.7
30.5
56.5
38.9
25.1
40.7
23.2
63.0
30.4
43.8
26.9
36.6
34.0
52.6
30.1
37.6
55.9
44.8
43.2
42.2
32.8
17.8
1880.7
1801.5
1763.9
1644.3
1652.0
1747.0
1683.1
1697.9
1802.8
2023.8
1958.9
1897.5
1781.7
1762.1
1925.5
1863.0
1880.4
1909.9
1720.3
1824.6
1938.5
2010.8
1975.0
1831.8
1756.5
1952.9
1855.6
1810.8
1604.7
1984.1
1698.5
1798.0
1717.3
1606.7
1693.3
1617.6
1606.3
1539.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
6
NED GLICK
[January
DATA SET 2 (continued).
Monthly and Annual Total Hours of Bright Sunshine in Vancouver, B.C.
Year Jan.
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
35.8
45.9
83.3
58.5
35.9
8.9
17.3
30.7
12.8
35.0
69.1
22.6
43.7
50.9
45.0
42.1
69.2
18.9
39
30
42
46
70
46
37
61
59
Feb.
94.6
62.2
91.9
48.0
82.2
80.4
62.8
49.9
78.7
46.6
79.1
38.0
56.2
124.9
30.3
77.7
62.6
99.7
76
84
83
156
103
132
86
72
88
Mar. Apr.
149.0
134.7
105.2
63.0
121.7
83.3
94.3
155.6
106.8
102.2
96.6
131.8
77.3
120.2
80.4
116.8
98.2
67.4
222
95
123
77
152
163
127
75
106
169.5
115.0
128.5
158.4
290.0
157.1
118.9
143.3
191.2
237.4
159.5
197.4
182.9
165.8
119.9
96.0
91.3
154.9
168
189
151
175
106
176
152
136
219
May
June
July
Aug.
Sept.
Oct.
223.1
143.0
271.7
231.5
203.7
221.0
189.9
218.2
182.9
312.2
248.1
308.9
236.3
135.6
203.5
119.5
277.6
212.5
253
263
177
254
280
228
264
232
259
199.3
244.1
237.9
223.0
300.2
183.0
101.6
170.9
162.0
124.6
190.7
220.5
219.9
199.4
322.6
232.8
146.1
124.0
330
181
275
224
247
262
112
168
207
259.5
207.4
247.3
309.4
316.6
332.2
278.3
240.7
196.5
333.5
201.7
365.0
331.9
388.1
310.9
245.5
177.5
210.8
310
258
319
334
311
288
352
351
280
264.5
130.6
165.6
294.3
276.8
248.4
195.5
134.1
309.1
278.5
245.8
286.7
220.3
181.6
293.8
148.8
225.3
229.5
225
281
340
216.1
215.8
297
260
325
263
193.2
184.5
209.6
222.6
201.2
215.1
169.0
116.3
148.2
147.6
227.1
164.4
128.5
210.7
152.8
170.2
166.5
169.1
179
153
215
177
131
192.0
150
165
192.4
72.9
118.5
136.6
55.0
81.5
153.1
107.4
100.0
81.3
75.5
135.2
114.6
92.8
115.6
137.8
87.0
78.6
151.6
111
112
74.3
73.6
165
149
127
182
74.1
SUMMARY OF RECORD BREAKTHROUGHS
Nov.
54.5
63.2
50.6
33.9
33.3
52.3
37.0
38.6
51.5
40.5
66.3
51.3
71.6
75.3
80.8
35.4
42.0
82.4
46
46
74
43
71
75
42
73
39
Dec.
15.9
30.5
43.5
20.1
29.1
21.8
20.0
23.4
37.1
24.8
36.2
10.9
32.1
81.3
20.9
28.3
18.5
54.0
52
26
47
60
21
40
41
64
29
Annual
1731.8
1479.6
1771.7
1747.7
1972.2
1756.6
1392.0
1421.7
1558.1
1758.4
1755.4
1912.1
1693.5
1849.4
1798.7
1400.1
1453.4
1574.8
2010
1718
1920
1836
1873
2048
2013
1904
1815
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
IN 65-YEAR SEQUENCES
Precipitation
Jan.
Record Highs
Record Lows
Reverse Highs
Reverse Lows
4
10
7
3
Feb.
5
3
3
5
Mar. Apr.
2
4
5
4
3
4
8
3
May
6
6
42
71
June Jul.
4
7
5
5
Aug. Sep.
Oct.
Nov.
Dec.
Average
5
8
3
4
2
4
4
5
6
4
4
4
4
4
6
5
4
7
5
5
5
5
2
6
4.17
5.50
4.71
4.71
6
3
4
5
4
5
4
6
1
4
6
7
7
6
2
3
5
4
5
5
4
7
3
5
4.25
5.25
4.17
5.00
Sunshine
Record Highs
Record Lows
Reverse Highs
Reverse Lows
4
4
6
6
5
6
3
4
3
7
5
4
3
5
3
6
4
7
5
5
5
5
4
4
can be excluded (an almost valid assumptionin Vancouver);and these 12 monthly precipitation
sequencesgive an average of 5.42 lows. Similarcounts of "backward"highs and lows are obtained
readingthe 12 sequencesin reverse,frommost recentbackto the oldest observations.(Sincethe May
precipitationfor 1965,recordedwith two decimals,ties the figurefor 1966,I treat May 1965as 1/2 in
the counts of reverse order highs and lows.) Consideringhighs and lows, chronologicaland reverse
order,the 48 sequencesgive an overallempiricalaverageof 4.77 recordbreakingvaluesper sequence
(compareto the theoreticalexpectationof 4.76).
1978]
BREAKING RECORDS AND BREAKING BOARDS
7
All these counts can be repeated for the monthly sunshinesequences in Data Set 2.
The accompanyingSummaryof Record Breakthroughsshows all 96 counts(24 monthlydata sets,
consideringhighs and lows, forwardand backwardsequences).More than half of these 65-year-long
sequences have exactly 4 or 5 breakthroughs.
People I ask usuallyguess that therewill be morethan5 recordvaluesin 65 years.In predictingthe
numberof record highs for a much longer sequence, say 1,000 or 1,000,000observations,human
intuition, even among mathematiciansand statisticians,is definitely extravagantcomparedto the
simple model's expectationsof 7.49 and 14.39, respectively.
Over a long time, say a hundredor a thousandor ten thousandyears,there may be shiftsor cycles
of climate;so the basic model of "interchangeable"weatheryears may not fit well. If climatetrends
are not negligible,then the actualprobabilitiesfor recordbreakingwill differfrom what the simple
In fact, the observed frequenciesof record highs and lows can be used to infer
model predicts
whether or not data are a randomsample.*
3. Testsof randomness.At a meetingof the Royal StatisticalSociety abouttwenty-fiveyearsago,
F. G. Fosterand A. Stuart[14] pointedout that recordlow and recordhigh annualrainfallsat Oxford
were muchmorerarethanrecordbreakingperformances(low timesor high distances)in annualtrack
and field competitionsof the British Amateur Athletic Association.
This contrast is not surprising:athletic recruitingand training have intensified over the past
century;but no one has done muchabout the weather.Althoughathleticperformancesdo fluctuate,
there is an averagetrendover decadesfor nationalcompetitors(and,therefore,winners)to runfaster,
jumphigher,or throwfarther;while weatherfluctuationsover a centuryare more intuitivelyrandom,
without dramaticlinear trend.
Of course it is possible for 100 randomobservationsto be ordered so that the sequence has as
many as 10 or 50 or 100 recordhighs. But detailedcalculation[6] shows that the probabilityof 10 or
morerecordhighsin a 100-longrandomsequenceis less than5%. Therefore,in a situationwheredata
are less familiarthan rainfallsor race times, the mere findingof many recordhighs or lows suggests
that the data are not a simplerandomsample;that is, an alternativehypothesisshouldbe soughtto fit
the data better.
Foster and Stuart[14] gave formalproceduresusing the sum or the differenceof recordhigh and
record low frequenciesto fit or to test the hypothesisof randomness.Other statisticianshave also
consideredsuch inference procedures[3, 4, 14, 15].
* Returning to the Shearman brothers, I note that their locale of weather observation moved several times.
Nonetheless, official records report precipitation for "Vancouver (City)" or "Port Meteorological Office" through
1966. Beyond that date there are reports for 10 separate city locations, plus the University of British Columbia and
Vancouver Airport. Since rainfalls at nearby city locations often differ by more than an inch in a single month, I
have not extended the precipitation sequences in Data Set 1 beyond 1966. Also I omit 1903 and 1904 because some
months of precipitation data are missing from official records for those years.
For years since 1964 the monthly sunshine figures in Data Set 2 are taken from annual reports by the British
Columbia provincial Department of Agriculture [5], because I could not find relevant annual publications by the
federal Department of Transport. Although the provincial figures are derived from the same source, these monthly
sunshine totals are rounded to the nearest whole hour, dropping one decimal. The provincial practice of first
rounding and then adding the monthly figures may yield a different annual total than the federal total obtained by
adding the monthly figures and then rounding-off. Moreover, other discrepancies which I noticed in comparing
federal and provincial figures for 1960-1964 confirm that there are fairly frequent copying or typographic errors in
one or the other publication (or both). Also the federal 1964 summary makes a substantial mistake in summing its
own monthly sunshine figures to obtain its 1964 annual total. Apart from this addition error (which I noticed by
accident) I have made no attempt in Data Sets 1 and 2 to find and correct the various errors carried over from my
sources.
To count record highs and lows in the monthly sunshine sequences, I have broken several ties between
rounded-off figures by finding more exact values from daily weather summaries.
... And there you have examples of the complications which arise in compiling real data, no matter how easy
the job seems before you do it.
8
NED GLICK
[January
recordhighs,sayR. andR ', for any
It canbe shownthatthe numberof forwardandbackward
astheforwardrecordhighsandlows,say
distribution
sequencehaveexactlythesamejointprobability
Rn andLn[4].And,assamplesizen -* oo,thecountsof recordhighsandlowsbecomeapproximately
independent[141.
4. Carcaravansin a one-lanetunnel.Whentrafficmovingin one directionis confinedto a single
lane,a slowcaris likelyto be followedcloselyby a queueof vehicleswhosedriverswishto go faster,
butwhocannotpass.If thereis no exit fromthislane,thenmoreandmorefollowingvehicleswill
... untiltherehappensto be a
catchup andbe addedto the slow moving"platoon"or "caravan"
itsown
at a lowerspeed.Thisvehiclewillnotcatchup,butwillaccumulate
followingvehicletravelling
caravan.
Thuscarswhosedriversall desiredifferentspeedsin factwilltravelin caravansat actualspeeds
determinedby recordlowsin the sequenceof desiredspeeds.
modelto a randomsequenceof drivers,the frequencyof record
Applyingthe simpleprobability
of caravansformedby n drivers.Andthenumbersof trialsbetween
to the number
lowscorresponds
successiverecordbreakinglow valuescorrespondto the lengthsof caravans.
Sincecaravanswill be successivelyslower,separationsbetweencaravanswill increaseas time
passes.G. F. Newell[29]usedthisreasoningto explainwhycarsneartheexitof a longtunneltendto
travelfasterandin smallerbunchesmorewidelyseparatedthancarsin thetunnelneartheentrance.
Thismodelof trafficflowalso has been mentionedby otherauthors[13, 19, 29, 38].
5. Sequentialstrategyfor destructivetesting.Manyproductsfail understress.For example,a
ceases
component
forceis appliedto it;anelectronic
perpendicular
woodbeambreakswhensufficient
anda batterydiesunderthestressof time.But
of toohightemperature;
to functionin anenvironment
items.
the precisebreakingstressor failurepointvarieseven among"identical"
by graduallyincreasing
SupposethatI canobservean item'sexactfailurepointin a laboratory
testingof 100itemsI couldfindalltheir
time,etc.).Fromsuchdestructive
stress(force,temperature,
failurepoints,sayX1,X2,.. ., X100.ButnowsupposethatI onlyneedto findthe weakestitemin my
sample:I onlywanttheminimumvalueamongfailurestressesX1,X2,.. ., X1i. ThenI neednotstress
mostof the itemsto theirfailurepoints.
Testthe
sequentially.
The minimumfailurestressamonganybatchof itemscanbe determined
firstitemuntilit fails,andrecorditsfailurestressXi. Stopthenexttest(shortof failure)if thesecond
specimensurvivesthis amount:so the secondspecimen'sfailurestressX2 is determinedexactlyif
X2< Xl; otherwiseobtain only the "censored"informationthat X2> Xi, and hence Xl =
min(X1,X2).Ineithercaseproceedto thethirdspecimenandstopthetestif thisitemsurvivesa stress
equalto min(XI,X2):so X3 is observedonly if X3< min(Xi,X2);but min(Xi,X2,X3) is always
determined .... In general, the ii" item survives its stress test if X, > min(Xi.., X-,) =
min(X,,.. ., Xi); or the test concludes with stress-to-failure if Xi = min(X,. .,X,) <
min(XI,..., Xi-). In eithercase, the valuemin(X,.. .,Xi) is knownafterthe ilh trial.
Theitemsdestroyedin thissequentialprocedurearethosewith"recordlow"failurepoints.The
modelas the lowsin a sequenceof weather
frequencyof suchrecordlowsfitsthe sameprobability
records.For a sample of n items, the expected numberof items destroyedis 1 + (1/2)+ (1/3) +...
+
(1/n).Thisharmonicsumgrowsveryslowlycomparedto samplesize n. Forexample,thesumis only
5.19 whenn = 100andis only7.49whenn = 1000(see TableC).
The sequentialstrategyto find the minimumvaluegeneralizeseasilyto find the 2,3,..., or j
smallestvaluesamongXl, X2,. . ., Xn.To begin,testj itemsuntiltheyfail,at stressesXA,X2, ..., XA.
Thereafterstopthe ith trialif the itemsurvivesthej lowestfailurestressesamongall i - 1 previous
thatit is among
forthe ith item(i > j) is theprobability
of stress-to-failure
specimens.Theprobability
all ranksare
fromthe samecontinuousdistribution:
the j smallestof i independentobservations
is jli. Theexpectednumberof itemsdestroyedis the
equallylikelyforXi,so the desiredprobability
over all trials:
sumof failureprobabilities
BREAKING RECORDS AND BREAKING BOARDS
1978]
1.
1++
1
9
.
+ I j+ 1 j +2
+
..
n =i( 1+ +1+j+2
n)
j terms
-j
1 + 1+1+**
1)
If j is muchless than the samplesize n, then so is the expected numberof failures.For example,to
findthe weakest4 items in a sampleof 1000,I expect to destroyonly about26; and to findthe weakest
8 items I expect to destroy less than 50.
For some sortsof failuredistributions,notablythe exponential([13] page 41), the minimumor the
j smallestfailurestresses from a sample can be used to estimate the theoreticalmean of the failure
distribution.Usually, however,the sample minimumis used to estimate a low percentileratherthan
the mean or median of the failure distribution.
6. Tolerancelimitsfor failuredistributions.A buildingcode may prohibituse of a particulartype
of structuralcomponentunlessit has probabilityat least .95 of survivingsome severe stressx. In other
words,the failuredistribution'sfifthpercentileshouldsatisfyx.05> x. It is safer to under-estimatethe
percentile x.05than to over-estimate;so consider the lowest failure point observed in laboratory
testing. In a very large sample it is highly probablethat the breakingstrengthof the single weakest
item is below the x.05value. It can be calculated,for example, that sample size n = 90 is sufficiently
large to assure that x.05> min(Xl, X2,.. ., X.) with probability.99; so when this sample minimum
exceeds x, the inference x.05>x has ".99 confidence."
The smallestvalue in a randomsample of size n _ 90 also is called a level .99 tolerancelimit for
x.05,the fifth percentileof the sampled distribution.
The tolerancelimitinterpretationof smallorderstatisticsis aboutthirty-fiveyearsold [26,41, 45]. I
encounteredthis subjectwhen I joined a StatisticsCommitteeof the Study Groupon Wood Stresses
under auspices of the CanadianStandardsAssociation. (Vancouver'slumber export is important
enough to be illustratedon the cover of the Annual MeteorologicalSummary[25] cited before.) The
1970 "TentativeMethod for EvaluatingAllowable Propertiesfor Grades of StructuralLumber"[1]
mentionedthe sampleminimum(n = 90) as a level .99 tolerancelimit, althoughit did not suggestthe
sequential strategyfor findingit.
The 2nd smallestsamplevalue is a level .99 tolerancelimit for x.05when sample size n ' 130; and
the 3rd smallestsample value is a level .99 tolerancelimit when n _ 165. In general, the jth smallest
samplevalue is a level .99 tolerancelimitfor the fifthpercentilewhen n ' nj,wherevalues n, are given
in Table A for j = 1,2, 3,..., 15. The greaterthe value of j, the less the variabilityof the corresponding
tolerance limit and the less its conservativebias.
This table also gives values m,, the mean number of failures (items destroyed) in sequential
examinationof n, items to findthe j weakestamongthem. Becausethe ratio mj/n, is about 1/10, I can
afforda sample size about 10 times the numberof items I can actuallyaffordto break.For example,
TableA showsthat I mustsampleat least 228 items to use the 5th weakestas a level .99 tolerancelimit
for x.05;but I expect to destroy only 23.63 of the 228 sample items.
Table A can be used in findingnot only level .99, but also level .75, .90, or .95 tolerancelimitsfor
the fifth percentile.As the confidencelevel attachedto the sample minimumis increasedfrom .75 to
99, notice that samplesize must be tripled,from 27 to 90; but the increasein expected failures,from
3.89 to 5.08, is slight. Hence, level .99 may cost little more than level .75 tolerance limits.
Table B similarlygives samplesizes nj and expectedfailurenumbersm, for findingpercentilex.o1
tolerance limits, at confidence level .75, .90, .95, or .99.
By now my writinghas gone past the point I could ever carrya conversation.Often it is just as well
to leave some airof mysteryabouthow to calculatethe minimumsamplesizes in TablesA and B. But,
of course, for a mathematicianthis computationis quite elementary.
10
NED GLICK
TABLE
[January
A. Non -parametric Tolerance Limits for Fifth Percentile, x os
The jth smallest order statistic from a sample of n items is a non-parametric tolerance limit for the sampled
distribution's fifth percentile if sample size n ' n, where n, is given in the following tables for confidence level
-y = .75, .90, .95, or .99; that is, X,,) < x0 with probability = y. The sequential strategy to find the jth order statistic
is expected to destroy m, of the n, items sampled.
y = .75
j
n,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
27*
53
75*
101*
124*
147*
170
192
215
237
259
281
303
325
346*
y =.90
m,
3.89
8.11
11.45
15.66
20.59
24.73
28.86
32.96
37.09
41.18
45.28
49.37
53.46
57.55
61.60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.144
0.153
0.153
0.155
0.166
0.168
0.170
0.172
0.173
0.174
0.175
0.176
0.176
0.177
0.178
1
58*
93
124
153
180~
207*
234
260
285*
311
336
361
385*
409*
434
m1/n1
0.098
0.116
0.119
0.133
0.138
0.142
0.146
0.148
0.151
0.153
0.154
0.156
0.157
0.158
0.160
y = .99
Ml
m,/n,
j
tn,
4.65
9.22
13.70
18.11
22.45
26.77
31.09
35.38
39.62
43.90
48.14
52.37
56.57
60.77
65.00
0.080
0.099
0.111
0.118
0.125
0.129
0.133
0.136
0.139
0.141
0.143
0.145
0.147
0.149
0.150
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
90
130
165
197*
228*
258*
287*
315*
343*
371
398
425
451
477*
503*
nj
4.39
8.83
12.46
17.52
21.80
26.04
30.27
34.50
38.69
42.88
47.07
51.26
55.44
59.59
63.74
45
76*
105
132
158
183*
208*
233*
257*
281*
305*
329*
353
376*
399*
y = .95
2
3
4
5
6
7
8
9
10
11
12
13
14
15
m,
n,
m,/n,
m,
5.08
9.90
14.56
19.12
23.63
28.09
32.52
36.91
41.29
45.66
50.00
54.33
58.63
62.92
67.21
m,/n,
0.056
0.076
0.088
0.097
0.104
0.109
0.113
0.117
0.120
0.123
0.126
0.129
0.130
0.132
0.134
* Asterisk indicates sample size n, such that P {X(",)< x.05} < -y,but the probability is as close as possible to y
(I)
(i.e., for sample size n, + 1, the probability P {X"'tj< x.05}> y and the excess over y is of greater magnitude than
(I)
< x.05} ' y and the
the deficit for sample size n, ). Absence of an asterisk means sample size n1 is such that P {X'"2
(j)
probability is as close as possible to y.
For any proportionp, the p th percentileof a continuousprobabilitydistributionF can be definedas
the unique value x, such that F(xp) = p. In terms of the distribution'sinverse function, the pth
percentilex, = F- (p). It is not difficult(see [8], page 7) to show that the jth smallestorder statistic
X(7) from n sample observationswill satisfy X,n) < xp with probability
n
i=j
1Fn
I
n
-
(n
=
11
BREAKING RECORDS AND BREAKING BOARDS
1978]
TABLE
B. Non-parametric Tolerance Limits for First Percentile, x 0,
The jth smallest order statistic from a sample of n items is a non-parametric tolerance limit for the sampled
distribution's first percentile if sample size n '- n,, where n, is given in the following tables for confidence level
=
yim= .75, .90, .95, or .99; that is XU"I
) < x.01with probability y.'The sequential strategy to find the jth order statistic
is expected to destroy m, of the n,i items sampled.
y = .75
j
n,
1
2
3
4
5
6
7
8
138
268*
391*
510
626*
741*
855
967*
y = .90
ml
m,/n,
j
n,
5.51
11.34
17.14
22.92
28.67
34.42
40.15
45.87
0.040
0.042
0.044
0.045
0.046
0.046
0.047
0.047
1
2
3
4
5
6
229*
388
531
666*
797*
925*
n1
1
2
3
4
5
298*
473
627*
773
913
m,/n,
6.01
12.08
18.06
23.98
29.88
35.75
0.026
0.031
0.034
0.036
0.037
0.039
y = .99
y = .95
j
ml
m,
m,/n,
j
6.28
12.47
10.56
24.58
30.56
0.021
0.026
0.030
0.032
0.033
1
2
3
4
5
n,
458*
661
837*
1001
1157
ml
m,/n,
6.71
13.14
19.42
25.62
31.74
0.015
0.020
0.023
0.026
0.027
*
Asterisk indicates sample size ni such that P{X, < x.}01< y, but the probability is as close as possible to y
(i.e., for sample size n, + 1, the probability PIXsj < x.o1}> y and the excess over y is of greater magnitude
than the deficit for sample size n,). Absence of an asterisk means sample size n, is such that P {X(7,)< x01} _ y and
the probability is as close as possible to y.
The subtractedsum is a binomialtail probabilitywhichcan be evaluatedusing binomialtables (or a
normalapproximation)or a calculator.Sincethis sum vanishesas samplesize n -) 00, one can find,for
any fraction y, a sample size nj(y) so large that the event xp> X,n ) has probability _ y for all
n _ nj(y).
samplesize n,(y) does not depend
Clearlythis constructionof tolerancelimitsis distribution-free:
on the form of the continuousdistributionfunction F(x) = P{Xi < x}.
7. Persistenceof recordbreakingand divergenceof the harmonicseries.No matterwhatthe
presentprecipitationrecordmaybe, it is certainthat eventuallytherewill occura yearwith morerain.
No matterhow manyrecordbreakingyearshave been countedto date, therewill be alwaysone more.
Such unit incrementsmake the count of record breaking years arbitrarilylarge as the years of
observationincrease indefinitely.
These statementsabout rainfallinterpreta specificmathematicaltheorem(and suggestits proof).
From this point onwardI emphasizeformal limit results ratherthan applicationsof the theory of
record values.
Formally,suppose that identicallydistributedrandomvariablesX1,X2,.. ., Xn are exchangeable
(in particular,it sufficesthat the sequenceof randomvariablesbe independent).The observationXi is
called a recordhigh or upperrecordvalue or laddervalue* if Xi strictlyexceeds all previousvaluesin
the sequence. For example, there are 3 recordhighs in the first 10 years of Januarysunshinefigures
* A term used by Feller [12].
12
NED GLICK
[January
(Data Set 2):
55.3, 45.5, 14.4. 44.2, 43.3, 38.7, 57.7, 65.9, 53.9, 45.3.
In this sequencethe recordhighsare XI = 55.3, X7 = 57.7, and X8 = 65.9. Assume that exact ties have
zero probability(arbitrarilyprecisemeasurementfroma continuousdistribution).Then Xi is a record
high if and only if Xi = max(X1,. .., Xi). Since all i ranksare equally likely for Xi, an upper record
value (maximumrank) has probability1/i at the ih trial.
Let Rn denote the numberof recordhighsamongthe firstn observations.A formalversionof the
statement at the beginningof this section is the following theorem:
The count of recordhighs Rn ?-*o withprobabilityone as sample size n -- .00
To prove this proposition,consider an initial sequence of n, observationsXI, X2, .. ., X", and a
furtherbatch of n2 observationsXn,,1.. ., Xnl+n2.
The probabilitythat this additionalbatch contains
no new record value is
= P{max(Xl, . . 9,,)
PjRnj=
Rnl
= max(X,,.. ., X*+j2)}-
+
the ratio of the initialsamplesize dividedby the total. By takingbatchsize n2 sufficientlylarge,I can
make this probabilityas smallas I wish. And I can repeatthis procedurefor manysuccessivebatches,
choosing sizes n2,n3,..
n, to satisfy
.
n,+n2</
(n, + n2)
(n,+ n2)+ n3
< Elr
4
(n, +**-+ nr-1)
+fnr-t)+nr<
(n1+*
so that, for arbitraryE >0,
P{no recordvalue in klh batch}? r(Eir) =,e.
E
k =l
But the probabilityof obtainingat least r recordhighsin a sequencewith length n _ ni + n2+.*.
is then
+
nr
P {R > r} ->P {at least one recordhigh in each of r batches}
=
1 - P{at least one of r batcheshas no recordhigh}
r
?1-
, P{no recordhigh in kth batch}
k= I
>1-E.
Since E is arbitrarilysmall, conclude that P{Rn > r}-* 1, for arbitrarilylarge integer r. That is, the
number of record highs Rn -? oc in probabilityas sequence length n -* Oo.Because the sequence
RI c R2 ?*** is monotone, it follows that also Rn-a oo with probabilityone.
Viewed in the theory of recurrentevents [12], record breakingis a persistentphenomenon.
1978]
13
BREAKING RECORDS AND BREAKING BOARDS
That the expectation E(Rn)-- oo is immediate: for any integer r and n ? n I+
as above,
n
n
E(Rn)=E k=l kP{Rn= k}
E
n2+
.n+f
kP{Rn= k}
k=r
r} > r(1- e).
_rP{Rn
In fact, the precedingargumentis an obvious modificationof the classic harmonicdivergenceproof
+ nr. The nth partial harmonicsum is:
with batch sizes nk = 2k-' and n = ni +
n2
1/n =
h+
1+2+3+
+
*
1+
+ 1)
+ (4 + 5 +
6
+ 7)
2r-1
> 1 +2/4+4/8+
terms
(r + 1)/2.
***+2r1/2r=
The other harmonicdivergencedemonstrationwhichis popularin calculustexts shows n=Xi-l
fnx-'dx = ln(n).
Several argumentsin subsequentsections exploit these propertiesof the harmonicseries.
8. The frequencyof recordbreaking. Define binaryvariatesto indicatethe trialsat whichrecord
highs occur in the originalsequence:
1 if X, = max(X1, X2,..
X,)
Y. =
0 otherwise,
with expected value and variancegiven by
E(Y,)= PIYi= 1}= P{Xi = max(X7,...,
V(Y.) = E(
[E(Y.
Y2)-_
X)} = lIi,
=
)]J2
1/i -
1/i2.
Any distinct pair Yi and Y, (say i < j) are uncorrelatedsince
E(YY, )= P{Y, = 1 and Y, = 1}
=P{X,= max(X1,...,
=P{Xi= max(X17
**
and
X,)
*7X, ) <
=P{X, = max(X1, . . .* X,
X, = max(X1,...7
max(X,,..
*. *7X}
xP{X,= max(X,+
1
i j j-i
ij
}
)}
x P{max(X1, .. ., X) < max(X,+*X}
l1/iu1
X}
X,)}
*
)}
X, )}
=P{Y, =1} P{Y, = 1}= E(Y,)E(Y,).
Thus the randomnumberof recordhighs among X1,
X2,..
.,
Xn
is Rn =
Y, with expectation
14
[January
NED GLICK
and variancegiven by
n
n
E(Rn)=
E(Y) =Eli
n i=l
V(Rn)
=EV(Yi)
n i=
=
n
=
l
1li-
11i2.
Since
1
7
Euler'sconstant= .5772... and
-ln (n)
n~~~~~~~~~'
1 IT2
1.6449...,
2
logarithmtables (or electronic calculators)easily give approximationsfor E(Rn) and V(Rn). Also
both expressionsare evaluatednumericallyin Table C for selected values of n: I find that the actual
numberssurprisemathematiciansas well as people who know little about logarithms.
TABLE C
Expectation, variance, and standard deviation of R", the number of record values
in a random sequence of n independent and identically distributed observations
E(Rn)
=El/i
V(Rn)
=
E,
l/-
I1i2
/V(Rn
n
E(Rn)
V(Rn)
2
3
4
5
6
7
8
9
10
1.50
1.83
2.08
2.28
2.45
2.59
2.72
2.83
2.93
.25
.47
.66
.82
.96
1.08
1.19
1.29
1.38
.50
.69
.81
.91
.98
1.04
1.09
1.14
1.17
20
30
40
50
60
65
70
80
90
100
3.60
3.99
4.28
4.50
4.68
4.76
4.83
4.97
5.08
5.19
2.00
2.38
2.66
2.87
3.05
3.13
3.20
3.33
3.45
3.55
1.41
1.54
1.63
1.70
1.75
1.77
1.79
1.83
1.86
1.88
200
300
400
500
600
700
800
900
1000
5.88
6.28
6.57
6.79
6.97
7.13
7.26
7.38
7.49
4.24
4.64
4.93
5.15
5.33
5.49
5.62
5.74
5.84
2.06
2.15
2.22
2.27
2.31
2.34
2.37
2.40
2.42
1,000,000
14.39
12.75
3.57
BREAKING RECORDS AND BREAKING BOARDS
1978]
15
The variance V(Rn) gives boundson the probabilityof destroyingtoo manyitems in a sequential
strategyfor destructivetesting.In particular,a one-sidedChebyshevinequality([13] page 152)implies
that
i- rlV(R,)
PtR
>
}
1
PIR,
-V(Rn,) +[r - E(Rn)]
For example, the values of E(Rn) and V(Rn) in Table C imply that P{Ri? 9} ' .14 and
P{R oo' 18}' .05 (and actuallythese bounds are quite conservativesince detailed calculation[6]
shows P{R. 9}= .022).
In a later section I show that the binaryvariates Y,, Y2,Y3,... are independentas well as pairwise
uncorrelated.Because of this independence,general limit theorems can be invoked to show more
preciselyhow the random sum Rn
Yi grows ptobabilisticallylike ln(n). I have noted that
,=
n
(i)
E(Rn
n
E(Yi) =
and that ln(n) approximatesthe expectation E(Rn) for large n, so that
1
V(Yn)
[E(Rn) ]2
n [In(n) ]2
and consequently
(ii)
E(Y
n
n=1
[E(Rn<
InvokingKolmogorov'sconvergencecriterionfor sums of independentrandomvariables([24] page
238), conditions (i) and (ii) imply that the sequence Rn/E(Rn) -*1; and hence Alfred Renyi [30]
obtained the following "strong law of large numbers" for the frequency of record highs (since
E(Rn )/Iln(n)-) 1):
one as samplesize n --oo,
Rn/Iln(n)- >1 withprobability
which implies the divergence Rn->oo discussedin Section 7.
Similarly,the criterion of Liapounov ([24] page 275) gives a "central limit theorem" for the
frequency of record highs: as sample size n ->o, the limit distributionof (Rn- ln(n))/Vln(n) is
normalwith mean = 0 and variance = 1. (Renyi also gave a "lawof the iteratedlogarithm."Resnick
[33] laterderivedall these limit theoremsfromresultsconcerningcountsin a continuous-timePoisson
process; see the end of Section 12, below.) But "the asymptotic distribution is a very poor
approximationfor n - 1000say, which is the only region of much statisticalimportance"[4].
The exact probabilitythat Rn = r is complicated.Instead,considerR2n- Rn the numberof record
highs over trials n + 1, n + 2,..., 2n. In particular,
P{R2n- Rn = 0} = P{no record high for n + 1 '
i ? 2n}
= P{ max (Xl, . . ., Xn )=max (Xl, . . ,X2n )
= n/2n = 1/2.
In general,the count R2n- Rn has asymptoticallya Poissondistributionwith mean= ln(2): that is, for
k =0,12,
....
P{R2n-
Rn = kj->
(2)1k
asn[ln
2
X
NED GLICK
16
[January
One proof uses the probabilitygeneratingfunctionsgi (s) = E(s Y I)= 1+ (s - 1)/i associatedwith
the binary variates Yi. Since R2 - R= Y.+1 + Y.+2 +
+Y2n, and the binary variates are
independent,this sum has generatingfunction given by
2n
2n
hn(s) = Hl gi(s) = l [1+ (s-1)i].
i=n+l
i=n+l
Take logarithmsof both sides and use the Taylorseries expansionIn(1 + x) = x - x2/2 + x313- * to
obtain
2n
ln[l +(s -1)/i]
In[hn(s)] =
i =n
[
=
=(S-_
s
2i2 +
i
I
1)
i=n+l
i3
I
sH
2
i=n+l
sH
i
i=n+l
As n a0, all of the above series tails vanishexcept for the harmonicinitialterm, whichconvergesto
(s - 1)[ln(2n) - In(n)] = (s - 1)ln(2). This limitis the logarithmof a Poissongeneratingfunctionwith
parameterIn(2)... . The argumentcan be generalizedto prove the followingtheoremfirststated by
Dwass [10]:
indexedbyan < i ' bn (for
As samplesizen --oo, thefrequency
of recordhighsamongobservations
a Poissoncountwithmean ln(b/a).
any b > a > 0) is asymptotically
9. Serialnumbersof recordbreakingtrials.RatherthanRn, the randomnumberof recordhighs
in n trials,considernow the randomserial numberN, of the trialat whichthe rth recordhigh occurs.
as a
The numberN, is calledsimplythe rth recordvaluetime.SinceI countthe initialobservation
recordvalue, R1 = 1 and N1 = 1. In general N, - r and, for n ' r,
P{N, <n} = P{Rn> r}.
In particular,the record time
given by
N2
has probabilitydistribution,over the integers i = 2,3,...,
P{N2= i} = P{X, = max(X1,..., X-l < Xi i
Since this probabilityis strictlydecreasingfor i = 2,3,..., the mode = 2; but the mean is
E(N2)= , iP{N2=i}=
_i 1=a.
Since N2< N3< ..., it follows that E(N,) = 00 for all r ' 2. More surprising,an argumentin the
next section shows that E(N,+i - N,) = a).
To studyjoint distribution
of N,, N,+i I must indicate why the binary variates Y1,Y2,..., Yn
definedin the precedingsectionare independent
(a conditionneededearlierto provethe Renyiand
Dw'asslimit theorems). Consider the event that Y1,Y2,..., Yn include exactly r ones, at trials
1=
il < i2<
...
< i, c n. This event correspondsto
max(X1,..., X,21) <max(Xi2,...
11
Xi
.,
Xi,3) <...
11
Xi2
which is the intersectionof independentevents of the form
<max(Xi,,..., Xn),
11
xi,
1978]
BREAKING RECORDS AND BREAKING BOARDS
17
max(Xl, . .., Xi-,) < max(Xi,. .., Xi) = Xi,
with i = ikl and j = ik - 1 or j = n. This component probabilityis
j-(i-l)
1
1
i
j-(i-1)
j'
so the desired joint probabilityis the product
(i2 - 1)
i3- 1)*(ir - )n
for Y2= 0, Y3= 0,..., Yi2-1=O
It is easyto checkthatthe productof marginal
probabilities
Yi2+1 = 0,... gives the same expression:
1 2 3 i2-2
2 3 4-i2-l
.1
i2
1
i2
i2+1
Yi2 =
1
1
(i2-1)(i3-1)
The joint distributionof serial numbersN2, N3,..., Nr is therefore given by
P{N2= i N3
i3*
**
Nr= i}
=(i2-
1) (i3-1)
***(i
-
1)iri
for 1 < i2 < i3 < ... < ir. The general marginaldistributionis complicated,but expressionshave been
given by variousauthors[7, 19, 30, 37]:
P{Nr= n}- =
ISrn--l I/n!,
wherethe numeratoruses Stirlingnumbersof the firstkind. Renyi [30] and David and Barton[7] used
Stirlingnumbersto show that
P{Rn = r} =
P{Rn = r; N2 =i2,
2
..
2:%i2<k3<
=1
N3 = i3,.. . Nr = ir}
n
E
n 2:,i2<i3<
(r -
cir,
_1
..*<ir:6n
(i2
1)
(i3
1)
(ir
1)
[I (n)r
(n
n)
n(r-1)
for largesamplesize n. But this approximationalso can be derivedusinggeneratingfunctionsas in the
precedingsection. (Or see [21], page 267.)
Renyi [30]treatedlimittheoremsfor Nras dualsof theoremsfor Rn.First,there is a "stronglaw of
large numbers":
In(Nr)
r
1 withprobabilityone as r -aoo;
or, equivalently,
one.
Nl'r e withprobability
This last result says that almost every sequence of chance observations X1,X2,X3,... from a
continuousdistributionwill give an arbitrarilypreciseestimateof the mathematicalconstante, even
when the distributionfunctionis unknownto me! I need only the serial numbersof trials at which
recordhighs occur. (This approachdifferssomewhatfrom statisticaldeterminationsof e or of ir in
experimentssuch as "matching"or "Buffon'sneedle problem"[12]. In those situations,a particular
event probabilityis a functionof e or of ir; and this probabilityis estimatedby the relativefrequency
in repeated trials.)
[January
NED GLICK
18
Also Renyi [30] stated a "central limit theorem" for the random variables Nr: as r --* o, the
distributionof (ln(Nr) - r)I\/r is asymptoticallynormal with mean = 0 and variance= 1.
forN,. Resnick[33]againderivedRenyi'slimit
Renyialsogavea "lawof the iteratedlogarithm"
result.And Vervaat([44] page323)gave a
theoremsfor recordtimesfroma moresophisticated
of Renyi'sN,centrallimittheorem.
Wienerprocessgeneralization
of the secondrecordtime N2, describedat the beginningof this
Returningto the distribution
section,noticethatfor n = 2,3,4, ....
P <2n )
P{N2 > n} = *E
i=n+l
(i
lii=n+l
P{N2 =i}
Jn
Vi1i
N3
A similarresultholdsforrecordtime
andconditionalprobability
of the eventN3> n, giventhat
the precedingrecord time N2= m. More generally, for n > m -' r,
P{N+<m
Nr=m
=P{Nr+I>nlNr=m}
= P{no recordhighfor m + 1 i 'n INr = m}
= P{max (Xi,.. .,Xm )=max(Xi,...,Xn)INr
= m}
= m/n.
by a rationalratiomr/n;andTata[43]gavethe
Anyrealx in the interval(0,1) canbe approximated
limit
theorem:
following
The distributionof the ratio NrINr+iis asymptoticallyuniformover the unit interval: that is, for
0< x < 1,
P{$
<x1-x
as r-oo.
section.
Theproofsuggestsdualitybetweenthisresultandthe theoremof Dwassin thepreceding
Giventhatthe rtbrecordhighoccursat trialN, the conditionalprobability
P{N
<
Nr =P{Nr+i>2Nr,Nr}
= P{no recordhighfor Nr+ 1 i -' 2N, IN,}
= P{R2N,-
RN, = O|N}Nr= Nr2Nr
= 1/2,
of N,. Similarly,for 0 < x < 1,
independent
P{Nr+<XjN
=P{Nr+i>
XN Nr
[NrX
as r -> oo, independentof Nr. (Here [N,/xJ means "the largest integer-- NrIx.")
Theargument
alsoindicates,asShorrock[37]andlaterResnick[33]proved,thatsuccessiveratios
... are asymptoticallyindependentuniformvariates.In other words, Shorrock's
N,/N,+1, Nr+I/Nr+2,
a uniform
can be used to approximate
theoremshowsthat an unknown continuousdistribution
randomnumbergenerator!I needonlythe numbersof the trialsat whichrecordhighsoccurin the
originalrandomlysampled X1,X2, X3,....
1978]
BREAKING RECORDS AND BREAKING BOARDS
19
Tata's argument requires the asymptotic probabilitythat the count (R[n,XJ- Rn) = 0. In the
precedingsection,however,the theoremof Dwassgives more:the asymptoticPoissonprobabilitythat
Rn)= k, for k = 0, 1, 2.
(R[n/xTherefore, for any positive integer s and real x in the
unit interval,
p{ Nr
<x
N)
>/
P{Nrs
NrJ=
< SIN}
-RN
P{R[N/
s-1
s-1
=
k=O
=
-RN,
PIR[N,x]
k INI,} >x
k=O
(-In (x))'l k!
as r -> 00, independent of N,. Thus a dual of the Dwass theorem generalizes Tata's result: for
s=1,2,3,...
andO<x<1,
4p$ii
as r -*oo.
<x1--*
x>f(-ln(x))kIk
)
k=O(
()/
{N,+s
10. Waiting times between record breaking trials. Now consider the wait from the rth to the
(r + 1)Strecord breakingtrial: denote this inter-recordwaitingtime by Wr= Nr, - N,.
Record breakingmust occur infinitelyoften; but it turns out that the inter-recordwaiting time
distributionshave mean = 00, althoughmode = 1. These results,first proved by Chandler[6], can be
deduced from independenceof the binary variates Y1, Y2,... via the following arguments.
P{Wr= k IN;}j= P{Nr+l= Nr+ k INr}I
= P{ Y, =O for Nr+ 1- i N +k-1,YN,+k=
Nr + 1
Nr+ k-2
Nr+1 Nr+2
Nr+ k-1
Nr
1
Nr+k
1 |
Nr}
N,
(Nr+ k-1)(Nr + k)
This expression is monotone decreasingas k increases; so, for any value of N, P {Wr= 1 N,}
P{Wr= k IN }, for k = 1, 2, 3,.
Taking expectations,P{ Wr= 1}> P{W = k}; that is, the most
probablevalue or mode = 1.
P{N, = i, Wr= k} = P{N, = i, Nr+l= i + k}
=?P{N2= 2, N3= 3,..., Nr 1= r-1,Nr = i, N+1= i + k}
1
(2)(3)
(r-2)(i-1)(i+k-1)(i+k)
1
1
_
=r!(i-1) Vk+i-1
1
k +iJ
and hence, for any integers m and i ' r,
P{Nr=
i r>
} '!
(i -
1) kl=m(
k + i-1
k + i)
1
1 _
1
r!(i-1)(i-1+m)
i-l
r!m
1
i-l+m
The marginaldistributionof W4satisfies
P{Wr m}l=
P{N, = i Wr> m}l
1
r!.m
r+m-1
=r
i-l
1
r!mi=
(-
i
m
[January
NED GLICK
20
and therefore, for arbitrarilylarge integers m,
00
E(W,)= >
kP{W, = k}?- mP{W, -' m}
k=1
1 r+m-1
=r!
-
i=r
It follows from the divergenceof the harmonicseries that E(W,) = a).
In other words, if I pay a fixed fee and receive in returna variabledollar amount equal to the
waitingtime W, then my expectedgain is positiveand my gambleis not a "fairgame"no matterhow
high the fee.
An alternate demonstration that E(Wr) = 00 assumes that observations X1, X2, X3,... have
exponentialdistributionF(x) = P{Xi < x} = 1 - e-X, for x > 0. This assumptionis.harmlessbecause
the precedingsection shows that record times N, and hence inter-recordwaiting times Wr,have
distributionswhich do not depend on F (only that it be continuous).The conditionalprobabilitythat
waitingtime Wr> m, given that the rthrecordvalue XN,= x, is just the probabilitythat independent
Xi <x for indices Nr + 1' i- Nr + m; viz., conditional probabilityis [F(x)]m = (1-e- x)m for
exponential sampling. Moreover, Section 11 shows that the record value XN, from exponential
samplinghas a gammadensity.Hence the unconditionalprobabilitythat Wr> m can be represented
(independentof the distributionF) as the integralof (1 - e-x )m with respectto a gammadensity:
P{Wr > m IXN, = x}dP{XN,
P{Wr >m}=
(1 - e-x
= X}
e-xdxI(r - 1)!
)mxr-l
or
P{Wr = m}= P{Wr > m - 1}- P{Wr > m}
J
[(1_e-x
)m1-(1-
ex)m
]xrl
e-xdxl(r
-
1)!.
Notice that the geometric series
E m[(I-e-x)M-t- le-x
)m=
m=l
E (I -e-x )m-l
m=1
1- (1- ex)ex;
so (as already showed by a differentargument)the expectation
E(Wr) =
,
PW,
=m}l=JXd
(I am tryingto avoidintegralcalculusin this paper,but the precedingargumentis too ingenious
to omit.)
For large values of r, the gammacan be approximatedby a normaldensity. Neuts [27, 28] used
such approximationin the integralexpressionfor P{ Wr> m } to prove a "law of large numbers"for
waitingtimes:as r -
,
In(W)
r
1978]
21
BREAKING RECORDS AND BREAKING BOARDS
in probability(also: with probabilityone [20]). By the same method, Neuts gave a "centrallimit
theorem":as r -* 00, the distributionof (in(Wr)- r)IVr is asymptoticallynormalwith mean = 0 and
variance= 1. There is also a "law of the iterated logarithm"for waiting times [42]. Shorrock([36]
page 222) and Resnick ([33] page 867) include these limit results in more general theorems for
inter-recordwaiting times.
The remarkablesimilaritybetween limit theoremsfor recordtimes and for inter-recordwaiting
timessuggeststhatthe wait Wrfor the next recordvalue is somehowcomparablein scale to the sumof
all pastwaitingtimesWI+ W2+ ' ' ' + Wri, = Nr - 1. Resnick[33]connectedthe limitbehaviourof
Wrand Nr in the following theorem: with probabilityone,
- In(Nr)L|=1
lim
sup
su Iln(Wr)(r)
irn--
In
Since Wr er in some probabilisticlimit sense, successive waits must have increasingmedian
values, althoughthey all have mode = 1 and mean= mo.Numericcomputations[19] show that
median(Wr+i
e= 2.718 ...
median(Wr)
even for r = 455, 6,7, 8. Note that "smallr" does not mean "smallsamplesize": Table C shows that
fewer than 8 record highs are expected in a sample of size n = 1000.
r
2
3
4
5
6
median (W,)
4
10
26
69
183
490
1316
2.50
2.60
2.65
2.65
2.68
2.69
med(-W,)/med (W,_,)
7
8
The end of the precedingsection found independentuniformlimit distributionsfor NrINr+, and
00, the waiting time ratio
Nr+iINr+2;and hence, as r --
Wr+i
Wr
Nr+1 (Nr+2INr+i) - i
Nr+2 -
Nr+1- Nr
1- (NrINr+i)
asymptoticallyhas the same distributionas
- 1
(1/V?- 1 Vor
U L
1-
U
where U and V areindependentrandomvariablesuniformover the interval(0, 1). Forany x > 0,
P{i
,
>x
=
=P{VV<(xU+1yl}
=1
xdu
J
I
dvdu
ln(l+x)
x JOxu + 1
x
Thus Shorrock[37] obtained the following limit result for ratios of waiting times: for any x >0,
{
W ,x
Jr
-*
x
as r-oo.
11. The record value sequence. Little has been said so far about actual record values, i.e.,
magnitudes.Rather I have focused on frequenciesand serial positions of record values.
Resultsconcerningthe recordfrequencyRn in a randomsampleX1,X2,. . ., X,, or concerningthe
22
NED
[January
GLICK
index numberNr of the rth record value trial are distribution-freeresults:they presumeidentically
distributed and independent (or exchangeable) random variables sampled from a continuous
probabilitydistribution;but the distributionfunction F never appearsin the theorems.
The actual record values X1< XN2< XN3
< XN4< ... present a quite contrarysituation.Results
concerningthis recordvalue sequenceperse definitelydependon the particulardistributionF, but do
not involve positions in the originalrandomsequence.
Many propertiesof the recordvalue sequence from continuousdistributionF can be stated most
convenientlyin terms of the transform
G(x)= -ln[1-
F(x)].
The derivativeG'(x) = F'(x)I[1 - F(x)] is the failurerate or hazardrate whichplaysa centralrole in
mathematicaltheory of reliability(see [2]p. 53). Both F and G are strictlyincreasingfunctionsover
their supportintervals,so there exist respectiveinverses F1 and G',
G1 (x) = F-'(1-
e-x ).
Specificexampleswill illustratethe distinctionbetween distribution-freeresultsfor recordvalue
timesin randomsamplingand distribution-dependent
resultsfor the recordvalue sequenceitself. The
theoremof Dwass at the end of Section 8 assertsthat, for 0 < a < b, the count of recordvalue trials
indexed between an and bn has asymptoticallya Poisson distributionwith mean In(b/a). This result
clearlyis asymptotic(validas n -m oo)since there can be at most (b - a)n recordindicesbetween an
and bn, while Poisson distributionpermitsan arbitrarilylarge frequency.By contrast,for any a <,p
< ** can take values
such that0 < F(a) < F(/3) < 1, arbitrarilymany of the recordsX1 < XN2< XN3
between a and P3.Dwass [10] proved that this count is exactly Poisson distributed,but with mean
G(P)- G(a) = - ln{[1 - F(f3)]/[1 - F(a)]}, dependingon the distributionfunction F.
The fact that exact distributionsare obtainedmore easily than most limit resultsfor recordvalues
is a furthercontrastto results for record value times.
The distributionof the rthrecordvalue was requiredin the precedingsection'scalculusargument
concerningwaitingtime W,.Let positiveintegerswl, w2,.. ., w,-, denote fixedwaits,and specifyfixed
Consider a random sample X1,...,
trial numbers n2 =1+w1, n3=n2 +w2,..., n,=n, l-+w,l.
Xn2,..., X..,... such that
Xi = x1> next (w, - 1) observations
Xn2= x2 > next (w2 - 1) observations
X., = X,.
withP{Xi <x} = F(x), the event
SinceX1,X2,X3,... are independentand identicallydistributed
above has probability
dF(x1)[F(x)] (w 1)
x
dF(X2) [F(X2)](w21)
? dF(xr-1) [F(xr-1)]'
Il
x dF(Xr),
where dF(x) = F'(x)dx and the derivativeF' is a probabilitydensityfunction.For increasingvalues
XI <X2< .. < Xr, the event above is equivalentto
BREAKING RECORDS AND BREAKING BOARDS
1978]
23
W1= W1,
X1=X1,
XN2 = X2, W2 = W2,
X__
=
XN,
= Xr
Xr-i,
Wr-i
= Wr-i,
Hence summationof the foregoingprobabilityelement over all possible waitingtimes gives the joint
probabilitythat XI = x, XN2= x2,.. ., XN,= Xr.This sum can be expressedas a productof geometric
series:
.. dF(xrl)[F(xr1)](wr-l-')dF(xr)
## * # dF(x1)[F(x1)J(",-1)
Wj=I
W2=1
W_1=1
"I... dF(xr1) E. [F(xrl)]w,-,dF(x,)
=dF(x,)wE[F(xi)]
l=()
W_-l-?
-
dF(x1).
= dG(x,)
dF(xr-) dF(xr)
.dG(x- )dF(xr).
Iteratedintegrationwith respectto x,, x2,.
has probabilityelement
., X,-
dP{XNJ = Xr} =
(over the region x < x2 <K* < xr) shows that XN,
G
[(
iix
dF(xr).
This argumentwas suggested by Karlin in a textbook exercise ([21] pages 267-268; see also [31]
page 69).
If observations X1,X2,X3,... have exponential distribution F(x) = 1 - e-x and G(x) =
-ln[1 - F(x)] = x, then the rIbrecord value has gamma probabilitydensity Xr-l e-xI(r - 1)!, as
asserted in the precedingsection.
Moreover, for X from any continuous distributionF, it is easy to show that the transformed
variableF(X) has uniformdistributionon the unit interval([21] page 237) and G(X) has standard
exponential distribution.Since G is strictly increasingover its support, the record value trials in
randomsamplingX1,X2,XA3,
... from F correspondto recordvalue trialsin a sample G(X1), G(X2),
G(X3),... from the exponential distribution.Thus transformedrecord value G(XN,) has gamma
distributionwith r degrees of freedom. For large r the gammadistributionis approximatelynormal.
Thus Resnick [31] obtaineda "centrallimit theorem":as r-oo, the distributionof (G(XNr) - r)IV/r
is asymptoticallynormal with mean = 0 and variance= 1. The corresponding"strong law of large
numbers"asserts that
G (XNr)Ir
-)/
1 withprobabilityone as r -m oo.
Resnick [31] (also see Shorrock [37]) gave explicit conditions on the function G which are
necessaryand sufficientfor convergencesin probability
XN,- G (r)-*O,
XNr/G1(r)- 1.
But also there are conditions under which the last ratio converges to a non-degeneraterandom
variableratherthan to a constant.Indeed, Resnick [31] characterizedthe types of limit distributions
for record values XNr; dependingon the sampled distributionfunction F, a record value sequence
satisfiesexactly one of the following convergencesin distributionas r -- 00:
[January
NED GLICK
24
W
(ii)
p-{G
P{G-'(r)
xN,
(r+r<
<r
x
}
i
x-'O
X [a In(x)]
<x
~
~
~
x?
x
x'O
XN,~~~~~~1
where X denotes the standardnormaldistributionfunction,a is a positiveconstantdependingon F,
and x in (iii) is the upper (necessarilyfinite) endpoint of the support intervalof distributionF.
I omit more precisestatementsof limit theoremsfor recordvalue sequencesbecausethese results
involve complicatedconditionson F or G. For details see Resnick [31] and also [9, 16, 32, 34, 35, 36,
37, 38, 39, 40, 43, 44].
12. Extreme values and extremal processes.Every random sample X1,XA2,...,X,, has a sample
maximum or upperextremevalue M,, = max(Xl, X2,.. ., X,,). Clearly M1= X1 and subsequently
every new maximumis a recordvalue, i.e., recordbreakingcorrespondsto a jump or strictinequality
Mn< Mn,+in the sequence of sample maximaXI -' M2 -<M3-' M4 ' *- - .
Thusa recordvalue sequencecan be extractedfroma maximalsequencewhichis alreadyremoved
fromthe randomsequence;but the conversepathis impossible.So maximalsequencesmightlogically
be studied before record value sequences and such was the historicalprecedence, contraryto my
arrangementof topics.Gumbel[18]has given the earlyhistoryandextensivebibliographyon statistics
of extremes, includingdistinctiveapplications:for example, using a river'spast flood levels (annual
maximaof dailyobservations)to plan dams,etc., with sensibleallowancefor worsefloodsin the future
(see [18], page 236, for analysisof MississippiRiver flooding at Vicksburg).
Historically,studiesof samplemaximaalso have been concomitantto the more generalsubjectof
orderstatistics[8], viz., randomlysampleddata filed or rankedfromsmallestvalue to largestvalue. In
the terms of statistical decision theory, ranked data constitute "sufficient statistics" both for
"classical" procedures and for most "nonparametric"methods (of which the tolerance limit
constructionin Section 6 is one example).Unlike the frequencyof recordbreaking,commonsample
statistics,such as mean or medianannualrainfall,can be computedfromordereddata as well as from
values in their randomsamplingsequence (in fact, rankingis generally the fastest way to find the
median or other sample percentiles).
In 1943B. V. Gnedenko [17] showed that there are preciselythree types of limit distributionsfor
sequences of sample maxima. Gnedenko's three types of extremevalue distributionscorrespond
exactly to the three limit laws for record values publishedby Resnick [31] in 1973 and mentioned
brieflyin the precedingsection. The limit laws for maximaand for record values from continuous
distributionF are linked by Resnick in a duality theorem. That is, the different extreme value
distributionspartitionthe space of all continuous distributionfunctions into disjoint "domainsof
attraction,"and recordvalue distributionsdetermineexactly the same partition.See [321for further
comparisonof record values and maxima.
*
*
*
A sequence of sample maximaMnplotted as a functionof sample size n can be regarded(see
Shorrock[39]) as a discrete-timeMarkov process with jumps at record value times N,. Common
techniques in stochastic process theory can be used to construct an analogous continuous-time
1978]
25
BREAKING RECORDS AND BREAKING BOARDS
Markov jump process M(t) such that, for any times 0 ' tl < t2 < . . . < tn, the variables M(t,) '
<M(t ) have the same joint distributionas sample maxima M, <M2
M(t2)
M,. The
studyof such continuous-timeextremalprocessesseems to have been originatedby Dwass [10, 11] and
Lamperti[23] in the mid-1960s.
Dwass used recordvalue theoremsto "motivatesome of the results"for extremalprocesses.But,
conversely,Shorrock[38, 39] and Resnick[33,34, 35] used continuous-timeprocessesto obtainresults
for record values. In particular,Resnick [33] showed that a continuous-timeextremalprocess M(t)
with jump times accordingto a non-homogeneousPoisson process(with intensityt-') has a "discrete
skeleton" M(n), n = 1,2,3,..., whose jump times behave precisely as record value times (for n
sufficientlylarge).
Thus Resnick [33] used the frameworkof extremal processes "to give a unified explanationof
known limit laws" for recordfrequenciesRnfor recordvalue trial numbersNr, and for inter-record
waiting times Wr.
_
.
This study was supported by the National Research Council of Canada, Grant A8044.
References
Asterisks indicate references most relevant to the theory of record values.
1. American Society for Testing and Materials (1970) "Tentative method for evaluating allowable properties
for grades of structural lumber". 1916 Race Street, Philadelphia, Pa. 19103.
2. R. E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, Holt, Rinehart & Winston,
New York, 1975.
3.* D. E. Barton and C. L. Mallows, The randomization bases of the problems of the amalgamation of
weighted means, J. Roy. Statist. Soc., Ser. B, 23 (1961) 423-433.
~ and
, Some aspects of the random sequence, Ann. Math. Statist., 36 (1965) 236-260.
4.*
5. British Columbia Department of Agriculture, Climate of British Columbia: tables of temperature,
precipitation, and sunshine, Annual reports for 1960-1974. Victoria, B.C., Canada.
6.* K. N. Chandler, The distribution and frequency of record values, J. Roy. Statist. Soc., Ser. B, 14 (1952)
220-228.
7.* F. N. David and D. E. Barton, Combinatorial Chance, Hafner, New York, 1962, pp. 178-183.
8. H. A. David, Order Statistics, Wiley, New York, 1970.
9. L. de Haan and S. I. Resnick, Almost sure limit points of record values, J. Appl. Probability, 10 (1973)
528-542.
10.* M. Dwass, Extremal processes, Ann. Math. Statist., 35 (1964) 1718-1725.
11.
, Extremal processes, II. Illinois J. Math., 10 (1966) 381-391.
12. W. Feller, An Introduction to Probability Theory and Its Applications, Volume I, 3rd ed. Wiley, New
York, 1968.
13.*
, An Introduction to Probability Theory and Its Applications, Volume II, 2nd ed., Wiley, New
York, 1971.
14.* F. G. Foster and A. Stuart, Distribution-free tests in time-series based on the breaking of records, with
discussion, J. Roy. Statist. Soc., Ser. B, 16 (1954) 1-22.
15.* F. G. Foster and D. Teichroew, A sampling experiment on the powers of the records tests for trend in a
time series, J. Roy. Statist. Soc., Ser. B, 17 (1955) 115-121.
16.* W. Freudenberg and D. Szynal, Limit laws for a random number of record values, Bull. Acad. Polon. Sci.
Ser. Sci. Math., Astr. Phys., 24 (1976) 193-199.
17. B. V. Gnedenko, Sur la distribution limite du terme maximum d'une serie aleatoire, Ann. of Math., 44
(1943) 423-453.
18. E. J. Gumbel, Statistics of Extremes, Columbia University Press, New York, 1958.
19.* D. Haghighi-Talab and C. Wright, On the distribution of records in a finite sequence of observations, with
an application to a road traffic problem, J. Appl. Probability, 10 (1973) 556-571.
20.* P. T. Holmes and W. E. Strawderman, A note on the waiting times between record observations, J. Appl.
Probability, 6 (1969) 711-714.
21.* S. Karlin, A First Course in Stochastic Processes, Chapter 9. Academic Press, New York, 1966.
22. W. Kruskal, Statistics, Moliere and Henry Adams, American Scientist, 55 (1967) 416-428.
23. J. Lamporti, On extreme order statistics, Ann. Math. Statist., 35 (1964) 1726-1737.
24. M. Loeve, Probability Theory, 3rd ed. Van Nostrand, Princeton, N.J., 1963.
26
HILLMAN-ALEXANDERSON-KLOSINSKI
[January
25. Meteorological Branch, Annual meteorological summary with comparative data: Vancouver Airport
1937-1964; Vancouver City 1900-1964. Department of Transport, Ottawa, Canada, 1964.
26. R. B. Murphy, Non-parametric tolerance limits, Ann. Math. Statist., 19 (1948) 581-589.
27.* M. Neuts, Waiting times between record observations, J. Appl. Probability, 4 (1967) 206-208.
, Probability, Allyn & Bacon, Boston, 1973, pp. 299-300, 432.
28.*
29.* G. F. Newell, A theory of platoon formation in tunnel traffic, Operations Res., 7 (1959) 589-598.
30.* A. Renyi, Theorie des elements saillants d'une suite d'observations, with summary in English.
Colloquium on Combinatorial Methods in Probability Theory, (1962) 104-117. Matematisk Institut, Aarhus
Universitet, Denmark.
31.* S. I. Resnick, Limit laws for record values, Stochastic Processes and Their Applications, 1 (1973) 67-82.
, Record values and maxima, Annals of Prob., 1 (1973) 650-662.
32.*
, Extremal processes and record value times, J. Appl. Probability, 10 (1973) 864-868.
33.*
, Weak convergence to extremal processes, Annals of Prob., 3 (1975) 951-960.
34.*
35.* S. I. Resnick and M. Rubinovitch, The structure of extremal processes, Advances in Appl. Probability, 5
(1973) 287-307.
36.* R. W. Shorrock, A limit theorem for inter-record times, J. Appl. Probability, 9 (1972) 219-223. Correction
on page 877.
, On record values and record times, J. Appl. Probability, 9 (1972) 316-326.
37.*
, Record values and inter-record times, J. Appl. Probability, 10 (1973) 543-555.
38.*
, On discrete time extremal processes, Advances in Appl. Probability, 6 (1974) 580-592.
39.*
40.* M. M. Siddiqui and R. W. Biondini, The joint distribution of record values and inter-record times, Annals
of Prob., 3 (1975) 1012-1013.
41. P. N. Somerville, Tables for obtaining non-parametric tolerance limits, Ann. Math. Statist., 29 (1958)
599-601.
42.* W. E. Strawderman and P. T. Holmes, On the law of the iterated logarithm for inter-record times, J.
Appl. Probability, 7 (1970) 432-439.
43.* M. N. Tata, On outstanding values in a sequence of random variables, Zeitschrift fir Wahrscheinlichkeitstheorie und verwandte Gebiete, 12 (1969) 9-20.
44.* W. Vervaat, Limit theorems for records from discrete distributions, Stochastic Processes and Their
Applications, 1 (1973) 317-334.
45. S. S. Wilks, Determination of sample sizes for setting tolerance limits, Ann. Math. Statist., 12 (1941) 91-96.
DEPARTMENT
OF MATHEMATICS,
AND DEPARTMENT
OF HEALTH
CARE AND EPIDEMIOLOGY,
UNIVERSITY
OF
BRITISH COLUMBIA, VANCOUVER, BR. COLUMBIA V6T 1W5, CANADA.
THE WILLIAMLOWELLPUTNAMMATHEMATICALCOMPETITION
A. P. HILLMAN, G. L. ALEXANDERSON,
L. F. KLOSINSKI
The following results of the thirty-seventhWilliamLowell PutnamMathematicalCompetition,
held on December4, 1976,have been determinedin accordancewith the governingregulations.This
annual contest is supported by the William Lowell Putnam Prize Fund for the Promotion of
Scholarshipleft by Mrs. Putnamin memory of her husbandand is held under the auspicesof the
MathematicalAssociation of America.
The first prize, five hundred dollars, was awardedto the Department of Mathematicsof the
CaliforniaInstitute of Technology,Pasadena, California.The members of its winning team were
ChristopherL. Henley, Karl W. Heuer, and Albert L. Wells, Jr.; each was awardeda prize of one
hundreddollars.
The second prize, four hundred dollars, was awarded to the Department of Mathematicsof
Download