Breaking records and breaking boards

2 NED GLICK [January For this departmentwe hope to obtain brief papersdealing with the subjects NOTES. CLASSROOM currentlytaughtin undergraduatemathematics.Occasionallywe shall also accept notes dealingwith material commonly encountered in the first year of graduate study. Classroomnotes should be relevant to some actual classroom situation that can be expected to exist at a good number of institutions. MATHEMATICAL EDUCATION.There is now a growingandvery healthyconcernwith mathematicians' role as teachers.This departmentwelcomes reportsof experimentsin novel methodsof teachingas well as discussionsof all educationalaspects of our profession. PROBLEMS.This departmentattractsmany readers and is an especially valuable feature of the MONTHLY.It will continueto seek and publishinterestingelementaryand advancedproblemsin pure and applied mathematics,both classicaland modern. REVIEWS.Throughthe telegraphicreviews the MONTHLYwill continue to provide as complete a coverage as possible of the current textbooks. At present the MONTHLYiS the only journal that performsthis servicefor the mathematicalcommunity.Extendedreviewswill continueto appear,and we hope that the numberof classroomreviews will increase. R. P. BOAS, Editor BREAKINGRECORDSAND BREAKINGBOARDS NED GLICK Part I. 1. 2. 3. 4. 5. 6. What a Statistician Can Do With a Minimum of Probability or the Probability of a Minimum Introduction Weather Records Tests of Randomness Car Caravans in a One-Lane Tunnel Sequential Strategy for Destructive Testing Tolerance Limits for Failure Distributions Part II. A Beginner's Guide to Record Breaking Mathematics 7. Persistence of Record Breaking and Divergence of the Harmonic Series 8. Frequency of Record Breaking 9. Serial Numbers of Record Breaking Trials 10. Waiting Times Between Record Breaking Trials 11. The Record Value Sequence 12. Extreme Values and Extremal Processes 1. Introduction.When I take observations in chronological sequence, how often will the outstandingrecordvalue be surpassed?For example,supposethat I registerthe annualtotal inchesof precipitationor, to take a less gloomy statistic,the total hoursof brightsunshinein Vancouver,where I live: what is the probabilitythat next year will be a new maximum? Breakthroughsare less likely later than early in a sequence of observations.The firstobservation necessarilymustbe a "recordhigh."But, priorto observingany values,I knowthat the second of two numbersin randomsequencehas equal probabilityof being smalleror largerthan the first.Hence the probabilityis exactly50%that a second, independentobservationwill be a new recordhigh surpassing the initial record, assumingthat there cannot be an exact tie (if measurementis arbitrarilyprecise). 1978] BREAKING RECORDS AND BREAKING BOARDS 3 Fromthe same perspective,there is probability1/3 that a thirdtrialwill be a new maximum,since the last of three repeatedobservationsis equally likely to be smallest,middle, or largest.... Similarly,all 10 ranksare equally likely for the tenth observation;so maximumrankfor the tenth observationhas probability1/10. The theoretical expected or average number of record highs in a chronologicalsequence of n independentobservationsis the sum of these probabilities:1 + (1/2) + (1/3)+..- + (1/n). This resultsurprisesmy acquaintanceswho have no backgroundin probabilitytheory.I have here the beginningof a conversationalploy to hold a person'sinterestwhen she or he asks whatwork I do and I say, "I am a statistician."Otherwisethis reply may ruin a fine conversation,as in the dinner scene depicted by WilliamKruskal[221. My ploy proceedsto verify predictionsfrom the simple probabilitymodel with some real weather records.It happensthat these weatherrecordsexcellentlyillustratethe pathosand problemsof work with statisticaldata. It is just as interestingto note that the simple model does not fit record breakingin athletic competitions.Race times, jump heights, and throw distanceshave improvedover several decades, while rainfallfluctuationsfromyear to year are "random."In fact, the frequenciesof recordhighsand lows can be used to infer whether observationsindicate linear trend or randomsequence. After the weatherand sports, I talk about traffic.Curiously,the simple model of randomrecord values says something about how cars tend to bunch together behind a slow vehicle. The same probabilitymodel applies in yet another context: a sequential strategy to find the weakest item in a sample of boardsor beams, and hence to establish"tolerancelimits"for lumber strengths.My interest in recordhighs and lows actuallybegan in discussionsat the WesternForest ProductsLaboratoryof the CanadianForestryService.Breakinga random,but usuallysmall,fraction of the availablebeamscan accomplishthe same purposeas 100%destructivelaboratorytesting.This conservationof materialillustratesthe economic spirit of experimentaldesign. So one example,the frequencyof recordhighsor lows, ties togetherseveral"applied"aspectsof a statistician'slivelihood.More surprising,the intuitiveidea that any recordcan be beaten also leads to grows without bound, becoming mathematicalproof that the harmonicsum 1 + (1/2)+ (1/3)+ finite number. than any bigger Harmonicdivergenceis the simplestof many"well known"limit theoremsand paradoxesrelated to record breaking. The mathematicalsections comprisingthe second part of my paper review primarilythose results in the theory of record values which do not depend on the particular distributionof the basic sequence of observations.This mathematicsis mostly at the level of an undergraduatediscrete probabilitycourse in the spirit of WilliamFeller's famous text [121.1 avoid differentiationand integrationas much as possible. 2. Weatherrecords. Weatherrecordsin Canadaare kept by the MeteorologicalBranchof the federalDepartmentof Transport.Vancouver'sweatherrecords,however,are largelyproductsof "the colourfulcareersof the two Shearmanbrothers":T. S. H. Shearman,who "wasofficiallyappointedby the governmentas weatherobserverfor the city of Vancouver"in April, 1905, and E. B. Shearman, who replacedhis brotherfrom 1915to 1948,for whichservicehe receivedthe BritishEmpireMedal. This historyI found in a Departmentof Transport1964Annual MeteorologicalSummary[251,from which most of my figuresare taken. Monthly,as well as annualinches of precipitationand hoursof sunshineare shown in Data Set 1 and Data Set 2, respectively.Looking down the Februarycolumn,for example, I find 5 recordhigh yearsin the precipitationdata(1900,1901,1902,1918,1961)and5 in the sunshinedata. Eachsequence includes65 observations;so 1 + (1/2)+ (1/3)+ .. + (1/65)= 4.76 is the theoreticalexpectednumberof record highs. The 12 monthly precipitationsequences actually give an empiricalaverage of 4.17 highs per sequence. Record lows have the same distributionas highs, if the case of absolutelynil precipitation 4 [January NED GLICK DATASET1. Monthly & Annual Total Precipitation (Rain + One-Tenth Snow) in Vancouver, B.C., in Inches Year Jan. 1900 1901 1902 1903 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 7.24 11.28 6.68 8.23 9.66 9.32 7.60 6.20 11.19 6.11 8.47 9.62 10.56 7.13 5.96 9.33 11.05 7.57 8.92 9.39 3.15 8.73 8.26 12.16 7.62 9.08 9.04 3.05 2.75 11.24 9.18 9.57 11.90 20.65 9.37 2.42 6.59 11.91 5.65 8.04 3.56 4.25 7.85 9.33 11.30 10.68 3.76 .84 6.48 11.10 8.23 14.08 Feb. Mar. Apr. 5.95 10.29 4.51 6.31 3.04 5.29 10.17 7.45 3.11 2.60 6.88 3.78 6.03 2.37 1.04 8.30 2.39 4.13 6.31 7.14 2.61 8.15 4.31 1.23 5.01 2.91 3.60 3.37 3.05 1.96 .89 3.92 6.25 4.38 5.38 2.53 4.87 3.33 3.28 4.42 4.18 3.04 7.40 14.55 4.07 5.87 5.61 8.20 10.50 7.48 1.70 7.35 6.61 4.47 1.21 5.20 2.51 6.66 2.53 3.62 4.75 3.44 2.63 4.06 3.86 2.14 8.86 1.16 3.84 6.01 4.24 2.44 6.70 2.48 2.58 4.73 6.80 1.88 1.87 7.01 4.29 1.64 4.47 4.81 9.42 3.33 4.72 4.95 6.35 4.57 6.65 9.15 4.84 6.32 6.43 .53 3.02 5.57 1.18 3.75 8.41 2.28 4.37 6.12 3.31 8.62 3.64 7.16 4.68 4.65 2.93 5.94 3.83 1.71 8.11 7.39 4.57 6.24 3.82 2.59 2.82 4.25 3.88 4.69 3.94 4.16 4.63 3.41 4.15 7.45 8.54 4.23 9.36 8.79 7.54 7.02 6.52 5.62 10.31 2.67 3.59 7.58 5.05 2.22 10.07 9.42 5.01 10.28 6.40 2.82 5.55 5.99 2.72 4.79 4.93 2.38 May June July 4.20 5.42 4.38 5.01 4.40 1.97 3.68 3.56 3.58 3.04 1.44 1.43 4.11 1.86 3.76 1.69 2.15 1.98 5.39 2.09 2.35 2.28 4.33 3.81 .74 3.58 3.42 1.07 1.41 1.34 1.69 5.40 1.15 1.00 3.60 1.02 1.94 3.06 2.52 3.64 .17 2.46 2.84 2.07 .91 .31 .38 2.20 4.17 .78 5.12 1.16 2.22 1.93 1.25 3.24 2.86 2.18 1.23 5.59 1.34 2.08 4.33 2.03 .69 3.44 .55 1.06 5.23 3.13 3.14 6.14 .78 1.86 3.40 3.03 .30 2.85 5.73 2.19 1.89 5.28 3.28 1.20 1.90 .95 .94 2.64 .43 4.87 1.74 2.08 6.05 11.82 1.73 1.76 2.61 1.48 .37 3.50 2.23 3.92 2.31 2.26 1.05 83 2.37 1.12 .45 1.70 1.59 2.45 .24 .92 1.54 2.02 .42 .91 5.25 .48 2.29 .15 .67 .32 .02 .52 .71 .74 .36 .94 .47 1.41 .08 .44 5.32 1.76 1.86 1.79 1.65 .47 .66 2.34 1.52 .48 3.36 1.10 .49 .91 1.63 2.34 2.16 2.75 2.15 .01 .40 1.29 Aug. 3.60 .22 1.15 1.07 .83 1.36 1.15 1.43 1.38 1.23 5.86 .85 .75 .36 .58 .93 4.59 1.15 2.91 2.84 2.01 .73 1.88 2.36 2.10 3.74 .20 1.50 .07 .61 2.01 1.12 1.24 1.36 2.11 3.45 .74 .68 1.70 2.12 .47 2.12 .92 1.27 .78 .49 4.03 1.46 2.93 .90 1.27 1.83 Sept. 1.61 2.65 3.39 8.35 8.87 4.51 1.46 2.23 2.47 4.41 2.84 3.89 6.86 .80 1.28 3.30 .30 1.16 10.37 5.03 5.76 2.97 5.81 .44 3.26 3.07 1.35 1.77 2.65 7.14 2.62 5.89 2.77 2.57 3.51 1.84 1.68 3.20 2.15 8.26 .79 1.82 4.86 3.21 1.94 2.18 2.83 1.45 1.65 3.64 1.08 4.76 Oct. Nov. Dec. Annual 9.20 5.20 4.72 5.72 7.60 1.76 6.68 7.06 9.04 2.24 4.64 6.19 6.37 8.83 2.16 3.49 7.56 3.14 8.34 10.08 3.26 2.56 6.56 3.00 6.37 7.22 7.38 3.57 7.50 5.22 5.66 7.94 4.99 6.70 4.21 8.14 5.74 6.72 10.85 9.86 6.20 7.16 5.87 7.10 5.91 10.29 4.69 6.83 10.24 5.66 2.13 5.26 10.00 14.06 10.33 11.36 8.25 13.23 13.69 15.66 10.62 12.68 9.21 10.08 10.18 5.41 6.37 5.23 7.02 11.25 7.14 8.99 2.63 6.13 5.87 4.05 7.80 8.39 5.36 2.54 3.00 8.34 10.17 5.45 9.56 4.65 1.84 11.01 6.44 12.35 4.91 6.93 6.11 3.29 8.97 10.07 6.19 5.22 14.57 11.89 5.01 6.71 2.34 10.44 9.22 8.09 9.55 4.21 7.33 8.02 8.41 4.28 8.79 8.82 8.80 3.95 2.84 10.36 5.71 11.72 8.07 9.74 11.01 5.56 10.35 15.88 8.35 13.05 8.99 6.92 5.34 8.58 5.22 11.92 7.37 12.85 12.27 8.46 10.63 10.94 13.53 11.78 10.47 9.15 9.59 9.16 3.76 6.45 8.40 13.32 9.59 7.62 10.50 6.17 7.97 11.28 72.29 66.36 65.29 60.56 59.05 57.59 62.61 58.45 59.38 52.27 57.05 57.03 53.78 49.93 56.08 61.25 62.71 57.21 63.28 61.18 40.63 52.49 52.52 51.07 53.21 59.05 46.46 37.83 43.78 67.60 66.39 64.22 58.49 62.23 55.48 66.97 50.28 66.89 60.47 65.41 48.20 46.17 47.76 62.14 67.14 67.50 66.07 51.18 67.55 57.56 43.83 65.61 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 BREAKING RECORDS AND BREAKING BOARDS 19781 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 10.00 8.39 5.09 4.17 7.10 6.47 3.52 3.79 13.48 6.75 8.51 6.83 7.43 7.04 13.28 15.26 7.57 2.37 1.76 7.53 11.62 3.82 9.54 10.67 9.28 4.53 2.58 7.41 7.72 6.96 3.51 8.97 5.49 6.04 4.35 4.29 6.44 2.30 4.87 3.92 4.32 1.12 2.87 3.60 3.66 3.13 2.98 5.02 4.09 3.27 2.48 1.75 2.13 3.22 .84 1.72 1.33 2.78 5.64 3.96 2.87 1.81 2.75 2.89 2.89 2.79 3.03 6.57 2.99 1.51 3.94 2.05 1.42 1.42 2.24 2.58 0.62 2.18 2.11 2.97 .89 2.81 Nil .97 .02 1.34 1.20 3.08 3.56 0.51 3.36 4.35 .31 2.52 1.84 2.35 .80 4.15 3.56 5.35 .83 2.33 2.57 1.66 4.45 1.87 5.65 1.72 3.10 7.03 2.08 1.94 3.12 1.86 7.41 0.67 3.17 3.15 7.08 13.69 3.88 5.93 4.99 11.00 8.82 5.77 8.71 3.38 8.67 7.43 5 16.10 13.59 13.86 4.59 9.03 8.09 7.06 7.70 11.55 9.33 10.17 6.68 9.47 9.39 7.14 13.71 9.24 9.44 8.24 7.13 10.04 12.49 12.92 6.71 7.43 15.34 69.36 60.20 70.14 45.93 60.03 64.81 62.22 76.34 63.08 58.45 64.04 55.03 65.93 53 54 55 56 57 58 59 60 61 62 63 64 65 DATASET2. Monthly and Annual Total Hours of Bright Sunshine in Vancouver, B.C. Year Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. Annual 1909 55.3 56.4 143.7 252.4 230.3 1910 45.5 91.5 117.7 181.2 276.5 14.4 84.8 154.0 248.2 174.9 1911 1912 44.2 81.4 233.8 113.8 215.4 1913 43.3 97.1 125.6 149.5 178.2 1914 38.7 46.4 104.1 155.8 270.2 1915 57.7 44.1 131.8 203.0 163.7 1916 65.9 84.8 86.4 141.5 216.9 1917 53.9 56.7 153.5 90.4 220.6 1918 45.3 95.8 103.5 257.5 253.0 1919 50.6 58.2 128.2 147.2 257.5 1920 44.8 148.4 108.0 196.1 249.0 1921 31.6 85.9 137.2 180.1 275.0 1922 77.4 98.5 116.5 143.7 243.2 1923 52.3 75.2 163.4 194.7 181.4 1924 24.6 59.2 157.8 168.0 250.3 1925 49.4 49.7 137.3 168.7 261.0 19Z6 27.4 56.1 187.7 220.0 186.4 1927 39.9 96.3 128.8 164.5 199.9 1928 42.6 102.7 116.2 168.4 278.4 1929 50.5 109.7 119.3 169.3 249.0 1930 100.7 85.8 169.5 142.7 254.2 1931 28.5 80.2 109.8 215.2 274.7 1932 65.4 97.0 94.6 147.4 246.3 1933 36.8 86.1 117.8 223.4 146.1 1934 56.7 118.8 136.5 237.5 236.2 1935 44.8 94.3 84.8 246.9 277.5 1936 54.6 70.5 121.6 168.5 174.0 1937 87.1 69.7 93.7 79.0 221.1 1938 30.7 78.0 110.5 175.9 313.1 1939 20.8 75.9 116.2 166.5 206.7 1940 42.9 63.4 109.0 167.2 242.6 1941 38.7 93.0 161.4 184.6 191.8 1942 51.2 75.7 101.6 120.6 140.0 1943 52.8 104.3 107.7 149.5 177.1 1944 50.9 80.7 155.7 121.4 213.0 1945 51.9 75.5 77.6 129.9 200.2 1946 38.2 42.0 95.2 88.4 287.1 251.8 186.5 217.7 229.0 186.6 250.5 266.7 222.9 188.5 318.8 250.1 206.5 137.4 257.0 228.6 238.0 254.3 288.6 219.6 195.7 187.9 219.0 187.9 297.0 198.3 301.0 194.7 212.9 201.6 292.1 143.9 329.2 164.2 192.7 234.7 201.2 191.5 135.7 224.4 312.8 272.7 198.1 270.2 313.5 261.7 145.5 358.7 297.4 341.8 308.7 317.2 267.5 289.1 273.9 327.7 300.3 305.1 285.1 322.2 320.6 381.2 195.2 336.3 236.2 247.3 313.3 292.6 314.6 278.7 236.5 326.0 236.4 251.5 259.1 283.3 232.9 262.8 224.1 225.1 186.6 228.0 280.7 221.9 284.4 348.2 232.4 265.5 308.6 221.1 193.2 291.7 238.0 240.7 236.9 178.4 258.1 304.9 305.6 334.1 237.7 302.3 270.2 257.1 295.0 203.0 253.9 320.6 266.1 231.2 291.0 203.4 179.9 252.1 267.7 294.9 181.2 142.2 185.9 173.2 79.4 141.3 190.5 153.8 236.2 202.9 138.2 172.1 163.4 226.4 217.8 205.2 221.7 124.2 199.2 225.2 192.0 118.5 235.3 123.7 155.3 213.2 181.4 164.6 161.6 193.6 180.1 119.2 175.1 201.0 167.0 154.0 144.9 99.2 103.6 140.0 110.7 110.6 101.6 84.6 150.9 104.5 77.1 122.7 91.4 123.7 106.7 142.8 118.7 116.0 93.8 78.3 94.3 113.7 125.9 124.9 115.1 112.4 126.5 109.0 130.1 121.2 132.7 108.7 73.9 80.1 116.3 126.0 108.0 124.0 122.6 58.1 49.9 41.6 19.4 62.1 39.0 70.2 84.7 50.9 48.4 60.7 77.8 42.9 71.7 49.9 54.3 59.7 60.5 28.8 45.0 61.7 54.1 96.8 37.8 42.9 34.2 59.1 52.3 37.1 68.4 36.8 48.6 71.2 61.3 42.1 38.5 33.5 66.5 51.4 31.0 48.3 26.0 27.6 67.1 36.4 23.5 23.1 58.4 73.5 20.0 57.5 23.3 30.0 62.4 10.7 30.5 56.5 38.9 25.1 40.7 23.2 63.0 30.4 43.8 26.9 36.6 34.0 52.6 30.1 37.6 55.9 44.8 43.2 42.2 32.8 17.8 1880.7 1801.5 1763.9 1644.3 1652.0 1747.0 1683.1 1697.9 1802.8 2023.8 1958.9 1897.5 1781.7 1762.1 1925.5 1863.0 1880.4 1909.9 1720.3 1824.6 1938.5 2010.8 1975.0 1831.8 1756.5 1952.9 1855.6 1810.8 1604.7 1984.1 1698.5 1798.0 1717.3 1606.7 1693.3 1617.6 1606.3 1539.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 6 NED GLICK [January DATA SET 2 (continued). Monthly and Annual Total Hours of Bright Sunshine in Vancouver, B.C. Year Jan. 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 35.8 45.9 83.3 58.5 35.9 8.9 17.3 30.7 12.8 35.0 69.1 22.6 43.7 50.9 45.0 42.1 69.2 18.9 39 30 42 46 70 46 37 61 59 Feb. 94.6 62.2 91.9 48.0 82.2 80.4 62.8 49.9 78.7 46.6 79.1 38.0 56.2 124.9 30.3 77.7 62.6 99.7 76 84 83 156 103 132 86 72 88 Mar. Apr. 149.0 134.7 105.2 63.0 121.7 83.3 94.3 155.6 106.8 102.2 96.6 131.8 77.3 120.2 80.4 116.8 98.2 67.4 222 95 123 77 152 163 127 75 106 169.5 115.0 128.5 158.4 290.0 157.1 118.9 143.3 191.2 237.4 159.5 197.4 182.9 165.8 119.9 96.0 91.3 154.9 168 189 151 175 106 176 152 136 219 May June July Aug. Sept. Oct. 223.1 143.0 271.7 231.5 203.7 221.0 189.9 218.2 182.9 312.2 248.1 308.9 236.3 135.6 203.5 119.5 277.6 212.5 253 263 177 254 280 228 264 232 259 199.3 244.1 237.9 223.0 300.2 183.0 101.6 170.9 162.0 124.6 190.7 220.5 219.9 199.4 322.6 232.8 146.1 124.0 330 181 275 224 247 262 112 168 207 259.5 207.4 247.3 309.4 316.6 332.2 278.3 240.7 196.5 333.5 201.7 365.0 331.9 388.1 310.9 245.5 177.5 210.8 310 258 319 334 311 288 352 351 280 264.5 130.6 165.6 294.3 276.8 248.4 195.5 134.1 309.1 278.5 245.8 286.7 220.3 181.6 293.8 148.8 225.3 229.5 225 281 340 216.1 215.8 297 260 325 263 193.2 184.5 209.6 222.6 201.2 215.1 169.0 116.3 148.2 147.6 227.1 164.4 128.5 210.7 152.8 170.2 166.5 169.1 179 153 215 177 131 192.0 150 165 192.4 72.9 118.5 136.6 55.0 81.5 153.1 107.4 100.0 81.3 75.5 135.2 114.6 92.8 115.6 137.8 87.0 78.6 151.6 111 112 74.3 73.6 165 149 127 182 74.1 SUMMARY OF RECORD BREAKTHROUGHS Nov. 54.5 63.2 50.6 33.9 33.3 52.3 37.0 38.6 51.5 40.5 66.3 51.3 71.6 75.3 80.8 35.4 42.0 82.4 46 46 74 43 71 75 42 73 39 Dec. 15.9 30.5 43.5 20.1 29.1 21.8 20.0 23.4 37.1 24.8 36.2 10.9 32.1 81.3 20.9 28.3 18.5 54.0 52 26 47 60 21 40 41 64 29 Annual 1731.8 1479.6 1771.7 1747.7 1972.2 1756.6 1392.0 1421.7 1558.1 1758.4 1755.4 1912.1 1693.5 1849.4 1798.7 1400.1 1453.4 1574.8 2010 1718 1920 1836 1873 2048 2013 1904 1815 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 IN 65-YEAR SEQUENCES Precipitation Jan. Record Highs Record Lows Reverse Highs Reverse Lows 4 10 7 3 Feb. 5 3 3 5 Mar. Apr. 2 4 5 4 3 4 8 3 May 6 6 42 71 June Jul. 4 7 5 5 Aug. Sep. Oct. Nov. Dec. Average 5 8 3 4 2 4 4 5 6 4 4 4 4 4 6 5 4 7 5 5 5 5 2 6 4.17 5.50 4.71 4.71 6 3 4 5 4 5 4 6 1 4 6 7 7 6 2 3 5 4 5 5 4 7 3 5 4.25 5.25 4.17 5.00 Sunshine Record Highs Record Lows Reverse Highs Reverse Lows 4 4 6 6 5 6 3 4 3 7 5 4 3 5 3 6 4 7 5 5 5 5 4 4 can be excluded (an almost valid assumptionin Vancouver);and these 12 monthly precipitation sequencesgive an average of 5.42 lows. Similarcounts of "backward"highs and lows are obtained readingthe 12 sequencesin reverse,frommost recentbackto the oldest observations.(Sincethe May precipitationfor 1965,recordedwith two decimals,ties the figurefor 1966,I treat May 1965as 1/2 in the counts of reverse order highs and lows.) Consideringhighs and lows, chronologicaland reverse order,the 48 sequencesgive an overallempiricalaverageof 4.77 recordbreakingvaluesper sequence (compareto the theoreticalexpectationof 4.76). 1978] BREAKING RECORDS AND BREAKING BOARDS 7 All these counts can be repeated for the monthly sunshinesequences in Data Set 2. The accompanyingSummaryof Record Breakthroughsshows all 96 counts(24 monthlydata sets, consideringhighs and lows, forwardand backwardsequences).More than half of these 65-year-long sequences have exactly 4 or 5 breakthroughs. People I ask usuallyguess that therewill be morethan5 recordvaluesin 65 years.In predictingthe numberof record highs for a much longer sequence, say 1,000 or 1,000,000observations,human intuition, even among mathematiciansand statisticians,is definitely extravagantcomparedto the simple model's expectationsof 7.49 and 14.39, respectively. Over a long time, say a hundredor a thousandor ten thousandyears,there may be shiftsor cycles of climate;so the basic model of "interchangeable"weatheryears may not fit well. If climatetrends are not negligible,then the actualprobabilitiesfor recordbreakingwill differfrom what the simple In fact, the observed frequenciesof record highs and lows can be used to infer model predicts whether or not data are a randomsample.* 3. Testsof randomness.At a meetingof the Royal StatisticalSociety abouttwenty-fiveyearsago, F. G. Fosterand A. Stuart[14] pointedout that recordlow and recordhigh annualrainfallsat Oxford were muchmorerarethanrecordbreakingperformances(low timesor high distances)in annualtrack and field competitionsof the British Amateur Athletic Association. This contrast is not surprising:athletic recruitingand training have intensified over the past century;but no one has done muchabout the weather.Althoughathleticperformancesdo fluctuate, there is an averagetrendover decadesfor nationalcompetitors(and,therefore,winners)to runfaster, jumphigher,or throwfarther;while weatherfluctuationsover a centuryare more intuitivelyrandom, without dramaticlinear trend. Of course it is possible for 100 randomobservationsto be ordered so that the sequence has as many as 10 or 50 or 100 recordhighs. But detailedcalculation[6] shows that the probabilityof 10 or morerecordhighsin a 100-longrandomsequenceis less than5%. Therefore,in a situationwheredata are less familiarthan rainfallsor race times, the mere findingof many recordhighs or lows suggests that the data are not a simplerandomsample;that is, an alternativehypothesisshouldbe soughtto fit the data better. Foster and Stuart[14] gave formalproceduresusing the sum or the differenceof recordhigh and record low frequenciesto fit or to test the hypothesisof randomness.Other statisticianshave also consideredsuch inference procedures[3, 4, 14, 15]. * Returning to the Shearman brothers, I note that their locale of weather observation moved several times. Nonetheless, official records report precipitation for "Vancouver (City)" or "Port Meteorological Office" through 1966. Beyond that date there are reports for 10 separate city locations, plus the University of British Columbia and Vancouver Airport. Since rainfalls at nearby city locations often differ by more than an inch in a single month, I have not extended the precipitation sequences in Data Set 1 beyond 1966. Also I omit 1903 and 1904 because some months of precipitation data are missing from official records for those years. For years since 1964 the monthly sunshine figures in Data Set 2 are taken from annual reports by the British Columbia provincial Department of Agriculture [5], because I could not find relevant annual publications by the federal Department of Transport. Although the provincial figures are derived from the same source, these monthly sunshine totals are rounded to the nearest whole hour, dropping one decimal. The provincial practice of first rounding and then adding the monthly figures may yield a different annual total than the federal total obtained by adding the monthly figures and then rounding-off. Moreover, other discrepancies which I noticed in comparing federal and provincial figures for 1960-1964 confirm that there are fairly frequent copying or typographic errors in one or the other publication (or both). Also the federal 1964 summary makes a substantial mistake in summing its own monthly sunshine figures to obtain its 1964 annual total. Apart from this addition error (which I noticed by accident) I have made no attempt in Data Sets 1 and 2 to find and correct the various errors carried over from my sources. To count record highs and lows in the monthly sunshine sequences, I have broken several ties between rounded-off figures by finding more exact values from daily weather summaries. ... And there you have examples of the complications which arise in compiling real data, no matter how easy the job seems before you do it. 8 NED GLICK [January recordhighs,sayR. andR ', for any It canbe shownthatthe numberof forwardandbackward astheforwardrecordhighsandlows,say distribution sequencehaveexactlythesamejointprobability Rn andLn[4].And,assamplesizen -* oo,thecountsof recordhighsandlowsbecomeapproximately independent[141. 4. Carcaravansin a one-lanetunnel.Whentrafficmovingin one directionis confinedto a single lane,a slowcaris likelyto be followedcloselyby a queueof vehicleswhosedriverswishto go faster, butwhocannotpass.If thereis no exit fromthislane,thenmoreandmorefollowingvehicleswill ... untiltherehappensto be a catchup andbe addedto the slow moving"platoon"or "caravan" itsown at a lowerspeed.Thisvehiclewillnotcatchup,butwillaccumulate followingvehicletravelling caravan. Thuscarswhosedriversall desiredifferentspeedsin factwilltravelin caravansat actualspeeds determinedby recordlowsin the sequenceof desiredspeeds. modelto a randomsequenceof drivers,the frequencyof record Applyingthe simpleprobability of caravansformedby n drivers.Andthenumbersof trialsbetween to the number lowscorresponds successiverecordbreakinglow valuescorrespondto the lengthsof caravans. Sincecaravanswill be successivelyslower,separationsbetweencaravanswill increaseas time passes.G. F. Newell[29]usedthisreasoningto explainwhycarsneartheexitof a longtunneltendto travelfasterandin smallerbunchesmorewidelyseparatedthancarsin thetunnelneartheentrance. Thismodelof trafficflowalso has been mentionedby otherauthors[13, 19, 29, 38]. 5. Sequentialstrategyfor destructivetesting.Manyproductsfail understress.For example,a ceases component forceis appliedto it;anelectronic perpendicular woodbeambreakswhensufficient anda batterydiesunderthestressof time.But of toohightemperature; to functionin anenvironment items. the precisebreakingstressor failurepointvarieseven among"identical" by graduallyincreasing SupposethatI canobservean item'sexactfailurepointin a laboratory testingof 100itemsI couldfindalltheir time,etc.).Fromsuchdestructive stress(force,temperature, failurepoints,sayX1,X2,.. ., X100.ButnowsupposethatI onlyneedto findthe weakestitemin my sample:I onlywanttheminimumvalueamongfailurestressesX1,X2,.. ., X1i. ThenI neednotstress mostof the itemsto theirfailurepoints. Testthe sequentially. The minimumfailurestressamonganybatchof itemscanbe determined firstitemuntilit fails,andrecorditsfailurestressXi. Stopthenexttest(shortof failure)if thesecond specimensurvivesthis amount:so the secondspecimen'sfailurestressX2 is determinedexactlyif X2< Xl; otherwiseobtain only the "censored"informationthat X2> Xi, and hence Xl = min(X1,X2).Ineithercaseproceedto thethirdspecimenandstopthetestif thisitemsurvivesa stress equalto min(XI,X2):so X3 is observedonly if X3< min(Xi,X2);but min(Xi,X2,X3) is always determined .... In general, the ii" item survives its stress test if X, > min(Xi.., X-,) = min(X,,.. ., Xi); or the test concludes with stress-to-failure if Xi = min(X,. .,X,) < min(XI,..., Xi-). In eithercase, the valuemin(X,.. .,Xi) is knownafterthe ilh trial. Theitemsdestroyedin thissequentialprocedurearethosewith"recordlow"failurepoints.The modelas the lowsin a sequenceof weather frequencyof suchrecordlowsfitsthe sameprobability records.For a sample of n items, the expected numberof items destroyedis 1 + (1/2)+ (1/3) +... + (1/n).Thisharmonicsumgrowsveryslowlycomparedto samplesize n. Forexample,thesumis only 5.19 whenn = 100andis only7.49whenn = 1000(see TableC). The sequentialstrategyto find the minimumvaluegeneralizeseasilyto find the 2,3,..., or j smallestvaluesamongXl, X2,. . ., Xn.To begin,testj itemsuntiltheyfail,at stressesXA,X2, ..., XA. Thereafterstopthe ith trialif the itemsurvivesthej lowestfailurestressesamongall i - 1 previous thatit is among forthe ith item(i > j) is theprobability of stress-to-failure specimens.Theprobability all ranksare fromthe samecontinuousdistribution: the j smallestof i independentobservations is jli. Theexpectednumberof itemsdestroyedis the equallylikelyforXi,so the desiredprobability over all trials: sumof failureprobabilities BREAKING RECORDS AND BREAKING BOARDS 1978] 1. 1++ 1 9 . + I j+ 1 j +2 + .. n =i( 1+ +1+j+2 n) j terms -j 1 + 1+1+** 1) If j is muchless than the samplesize n, then so is the expected numberof failures.For example,to findthe weakest4 items in a sampleof 1000,I expect to destroyonly about26; and to findthe weakest 8 items I expect to destroy less than 50. For some sortsof failuredistributions,notablythe exponential([13] page 41), the minimumor the j smallestfailurestresses from a sample can be used to estimate the theoreticalmean of the failure distribution.Usually, however,the sample minimumis used to estimate a low percentileratherthan the mean or median of the failure distribution. 6. Tolerancelimitsfor failuredistributions.A buildingcode may prohibituse of a particulartype of structuralcomponentunlessit has probabilityat least .95 of survivingsome severe stressx. In other words,the failuredistribution'sfifthpercentileshouldsatisfyx.05> x. It is safer to under-estimatethe percentile x.05than to over-estimate;so consider the lowest failure point observed in laboratory testing. In a very large sample it is highly probablethat the breakingstrengthof the single weakest item is below the x.05value. It can be calculated,for example, that sample size n = 90 is sufficiently large to assure that x.05> min(Xl, X2,.. ., X.) with probability.99; so when this sample minimum exceeds x, the inference x.05>x has ".99 confidence." The smallestvalue in a randomsample of size n _ 90 also is called a level .99 tolerancelimit for x.05,the fifth percentileof the sampled distribution. The tolerancelimitinterpretationof smallorderstatisticsis aboutthirty-fiveyearsold [26,41, 45]. I encounteredthis subjectwhen I joined a StatisticsCommitteeof the Study Groupon Wood Stresses under auspices of the CanadianStandardsAssociation. (Vancouver'slumber export is important enough to be illustratedon the cover of the Annual MeteorologicalSummary[25] cited before.) The 1970 "TentativeMethod for EvaluatingAllowable Propertiesfor Grades of StructuralLumber"[1] mentionedthe sampleminimum(n = 90) as a level .99 tolerancelimit, althoughit did not suggestthe sequential strategyfor findingit. The 2nd smallestsamplevalue is a level .99 tolerancelimit for x.05when sample size n ' 130; and the 3rd smallestsample value is a level .99 tolerancelimit when n _ 165. In general, the jth smallest samplevalue is a level .99 tolerancelimitfor the fifthpercentilewhen n ' nj,wherevalues n, are given in Table A for j = 1,2, 3,..., 15. The greaterthe value of j, the less the variabilityof the corresponding tolerance limit and the less its conservativebias. This table also gives values m,, the mean number of failures (items destroyed) in sequential examinationof n, items to findthe j weakestamongthem. Becausethe ratio mj/n, is about 1/10, I can afforda sample size about 10 times the numberof items I can actuallyaffordto break.For example, TableA showsthat I mustsampleat least 228 items to use the 5th weakestas a level .99 tolerancelimit for x.05;but I expect to destroy only 23.63 of the 228 sample items. Table A can be used in findingnot only level .99, but also level .75, .90, or .95 tolerancelimitsfor the fifth percentile.As the confidencelevel attachedto the sample minimumis increasedfrom .75 to 99, notice that samplesize must be tripled,from 27 to 90; but the increasein expected failures,from 3.89 to 5.08, is slight. Hence, level .99 may cost little more than level .75 tolerance limits. Table B similarlygives samplesizes nj and expectedfailurenumbersm, for findingpercentilex.o1 tolerance limits, at confidence level .75, .90, .95, or .99. By now my writinghas gone past the point I could ever carrya conversation.Often it is just as well to leave some airof mysteryabouthow to calculatethe minimumsamplesizes in TablesA and B. But, of course, for a mathematicianthis computationis quite elementary. 10 NED GLICK TABLE [January A. Non -parametric Tolerance Limits for Fifth Percentile, x os The jth smallest order statistic from a sample of n items is a non-parametric tolerance limit for the sampled distribution's fifth percentile if sample size n ' n, where n, is given in the following tables for confidence level -y = .75, .90, .95, or .99; that is, X,,) < x0 with probability = y. The sequential strategy to find the jth order statistic is expected to destroy m, of the n, items sampled. y = .75 j n, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 27* 53 75* 101* 124* 147* 170 192 215 237 259 281 303 325 346* y =.90 m, 3.89 8.11 11.45 15.66 20.59 24.73 28.86 32.96 37.09 41.18 45.28 49.37 53.46 57.55 61.60 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.144 0.153 0.153 0.155 0.166 0.168 0.170 0.172 0.173 0.174 0.175 0.176 0.176 0.177 0.178 1 58* 93 124 153 180~ 207* 234 260 285* 311 336 361 385* 409* 434 m1/n1 0.098 0.116 0.119 0.133 0.138 0.142 0.146 0.148 0.151 0.153 0.154 0.156 0.157 0.158 0.160 y = .99 Ml m,/n, j tn, 4.65 9.22 13.70 18.11 22.45 26.77 31.09 35.38 39.62 43.90 48.14 52.37 56.57 60.77 65.00 0.080 0.099 0.111 0.118 0.125 0.129 0.133 0.136 0.139 0.141 0.143 0.145 0.147 0.149 0.150 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 90 130 165 197* 228* 258* 287* 315* 343* 371 398 425 451 477* 503* nj 4.39 8.83 12.46 17.52 21.80 26.04 30.27 34.50 38.69 42.88 47.07 51.26 55.44 59.59 63.74 45 76* 105 132 158 183* 208* 233* 257* 281* 305* 329* 353 376* 399* y = .95 2 3 4 5 6 7 8 9 10 11 12 13 14 15 m, n, m,/n, m, 5.08 9.90 14.56 19.12 23.63 28.09 32.52 36.91 41.29 45.66 50.00 54.33 58.63 62.92 67.21 m,/n, 0.056 0.076 0.088 0.097 0.104 0.109 0.113 0.117 0.120 0.123 0.126 0.129 0.130 0.132 0.134 * Asterisk indicates sample size n, such that P {X(",)< x.05} < -y,but the probability is as close as possible to y (I) (i.e., for sample size n, + 1, the probability P {X"'tj< x.05}> y and the excess over y is of greater magnitude than (I) < x.05} ' y and the the deficit for sample size n, ). Absence of an asterisk means sample size n1 is such that P {X'"2 (j) probability is as close as possible to y. For any proportionp, the p th percentileof a continuousprobabilitydistributionF can be definedas the unique value x, such that F(xp) = p. In terms of the distribution'sinverse function, the pth percentilex, = F- (p). It is not difficult(see [8], page 7) to show that the jth smallestorder statistic X(7) from n sample observationswill satisfy X,n) < xp with probability n i=j 1Fn I n - (n = 11 BREAKING RECORDS AND BREAKING BOARDS 1978] TABLE B. Non-parametric Tolerance Limits for First Percentile, x 0, The jth smallest order statistic from a sample of n items is a non-parametric tolerance limit for the sampled distribution's first percentile if sample size n '- n,, where n, is given in the following tables for confidence level = yim= .75, .90, .95, or .99; that is XU"I ) < x.01with probability y.'The sequential strategy to find the jth order statistic is expected to destroy m, of the n,i items sampled. y = .75 j n, 1 2 3 4 5 6 7 8 138 268* 391* 510 626* 741* 855 967* y = .90 ml m,/n, j n, 5.51 11.34 17.14 22.92 28.67 34.42 40.15 45.87 0.040 0.042 0.044 0.045 0.046 0.046 0.047 0.047 1 2 3 4 5 6 229* 388 531 666* 797* 925* n1 1 2 3 4 5 298* 473 627* 773 913 m,/n, 6.01 12.08 18.06 23.98 29.88 35.75 0.026 0.031 0.034 0.036 0.037 0.039 y = .99 y = .95 j ml m, m,/n, j 6.28 12.47 10.56 24.58 30.56 0.021 0.026 0.030 0.032 0.033 1 2 3 4 5 n, 458* 661 837* 1001 1157 ml m,/n, 6.71 13.14 19.42 25.62 31.74 0.015 0.020 0.023 0.026 0.027 * Asterisk indicates sample size ni such that P{X, < x.}01< y, but the probability is as close as possible to y (i.e., for sample size n, + 1, the probability PIXsj < x.o1}> y and the excess over y is of greater magnitude than the deficit for sample size n,). Absence of an asterisk means sample size n, is such that P {X(7,)< x01} _ y and the probability is as close as possible to y. The subtractedsum is a binomialtail probabilitywhichcan be evaluatedusing binomialtables (or a normalapproximation)or a calculator.Sincethis sum vanishesas samplesize n -) 00, one can find,for any fraction y, a sample size nj(y) so large that the event xp> X,n ) has probability _ y for all n _ nj(y). samplesize n,(y) does not depend Clearlythis constructionof tolerancelimitsis distribution-free: on the form of the continuousdistributionfunction F(x) = P{Xi < x}. 7. Persistenceof recordbreakingand divergenceof the harmonicseries.No matterwhatthe presentprecipitationrecordmaybe, it is certainthat eventuallytherewill occura yearwith morerain. No matterhow manyrecordbreakingyearshave been countedto date, therewill be alwaysone more. Such unit incrementsmake the count of record breaking years arbitrarilylarge as the years of observationincrease indefinitely. These statementsabout rainfallinterpreta specificmathematicaltheorem(and suggestits proof). From this point onwardI emphasizeformal limit results ratherthan applicationsof the theory of record values. Formally,suppose that identicallydistributedrandomvariablesX1,X2,.. ., Xn are exchangeable (in particular,it sufficesthat the sequenceof randomvariablesbe independent).The observationXi is called a recordhigh or upperrecordvalue or laddervalue* if Xi strictlyexceeds all previousvaluesin the sequence. For example, there are 3 recordhighs in the first 10 years of Januarysunshinefigures * A term used by Feller [12]. 12 NED GLICK [January (Data Set 2): 55.3, 45.5, 14.4. 44.2, 43.3, 38.7, 57.7, 65.9, 53.9, 45.3. In this sequencethe recordhighsare XI = 55.3, X7 = 57.7, and X8 = 65.9. Assume that exact ties have zero probability(arbitrarilyprecisemeasurementfroma continuousdistribution).Then Xi is a record high if and only if Xi = max(X1,. .., Xi). Since all i ranksare equally likely for Xi, an upper record value (maximumrank) has probability1/i at the ih trial. Let Rn denote the numberof recordhighsamongthe firstn observations.A formalversionof the statement at the beginningof this section is the following theorem: The count of recordhighs Rn ?-*o withprobabilityone as sample size n -- .00 To prove this proposition,consider an initial sequence of n, observationsXI, X2, .. ., X", and a furtherbatch of n2 observationsXn,,1.. ., Xnl+n2. The probabilitythat this additionalbatch contains no new record value is = P{max(Xl, . . 9,,) PjRnj= Rnl = max(X,,.. ., X*+j2)}- + the ratio of the initialsamplesize dividedby the total. By takingbatchsize n2 sufficientlylarge,I can make this probabilityas smallas I wish. And I can repeatthis procedurefor manysuccessivebatches, choosing sizes n2,n3,.. n, to satisfy . n,+n2</ (n, + n2) (n,+ n2)+ n3 < Elr 4 (n, +**-+ nr-1) +fnr-t)+nr< (n1+* so that, for arbitraryE >0, P{no recordvalue in klh batch}? r(Eir) =,e. E k =l But the probabilityof obtainingat least r recordhighsin a sequencewith length n _ ni + n2+.*. is then + nr P {R > r} ->P {at least one recordhigh in each of r batches} = 1 - P{at least one of r batcheshas no recordhigh} r ?1- , P{no recordhigh in kth batch} k= I >1-E. Since E is arbitrarilysmall, conclude that P{Rn > r}-* 1, for arbitrarilylarge integer r. That is, the number of record highs Rn -? oc in probabilityas sequence length n -* Oo.Because the sequence RI c R2 ?*** is monotone, it follows that also Rn-a oo with probabilityone. Viewed in the theory of recurrentevents [12], record breakingis a persistentphenomenon. 1978] 13 BREAKING RECORDS AND BREAKING BOARDS That the expectation E(Rn)-- oo is immediate: for any integer r and n ? n I+ as above, n n E(Rn)=E k=l kP{Rn= k} E n2+ .n+f kP{Rn= k} k=r r} > r(1- e). _rP{Rn In fact, the precedingargumentis an obvious modificationof the classic harmonicdivergenceproof + nr. The nth partial harmonicsum is: with batch sizes nk = 2k-' and n = ni + n2 1/n = h+ 1+2+3+ + * 1+ + 1) + (4 + 5 + 6 + 7) 2r-1 > 1 +2/4+4/8+ terms (r + 1)/2. ***+2r1/2r= The other harmonicdivergencedemonstrationwhichis popularin calculustexts shows n=Xi-l fnx-'dx = ln(n). Several argumentsin subsequentsections exploit these propertiesof the harmonicseries. 8. The frequencyof recordbreaking. Define binaryvariatesto indicatethe trialsat whichrecord highs occur in the originalsequence: 1 if X, = max(X1, X2,.. X,) Y. = 0 otherwise, with expected value and variancegiven by E(Y,)= PIYi= 1}= P{Xi = max(X7,..., V(Y.) = E( [E(Y. Y2)-_ X)} = lIi, = )]J2 1/i - 1/i2. Any distinct pair Yi and Y, (say i < j) are uncorrelatedsince E(YY, )= P{Y, = 1 and Y, = 1} =P{X,= max(X1,..., =P{Xi= max(X17 ** and X,) *7X, ) < =P{X, = max(X1, . . .* X, X, = max(X1,...7 max(X,,.. *. *7X} xP{X,= max(X,+ 1 i j j-i ij } )} x P{max(X1, .. ., X) < max(X,+*X} l1/iu1 X} X,)} * )} X, )} =P{Y, =1} P{Y, = 1}= E(Y,)E(Y,). Thus the randomnumberof recordhighs among X1, X2,.. ., Xn is Rn = Y, with expectation 14 [January NED GLICK and variancegiven by n n E(Rn)= E(Y) =Eli n i=l V(Rn) =EV(Yi) n i= = n = l 1li- 11i2. Since 1 7 Euler'sconstant= .5772... and -ln (n) n~~~~~~~~~' 1 IT2 1.6449..., 2 logarithmtables (or electronic calculators)easily give approximationsfor E(Rn) and V(Rn). Also both expressionsare evaluatednumericallyin Table C for selected values of n: I find that the actual numberssurprisemathematiciansas well as people who know little about logarithms. TABLE C Expectation, variance, and standard deviation of R", the number of record values in a random sequence of n independent and identically distributed observations E(Rn) =El/i V(Rn) = E, l/- I1i2 /V(Rn n E(Rn) V(Rn) 2 3 4 5 6 7 8 9 10 1.50 1.83 2.08 2.28 2.45 2.59 2.72 2.83 2.93 .25 .47 .66 .82 .96 1.08 1.19 1.29 1.38 .50 .69 .81 .91 .98 1.04 1.09 1.14 1.17 20 30 40 50 60 65 70 80 90 100 3.60 3.99 4.28 4.50 4.68 4.76 4.83 4.97 5.08 5.19 2.00 2.38 2.66 2.87 3.05 3.13 3.20 3.33 3.45 3.55 1.41 1.54 1.63 1.70 1.75 1.77 1.79 1.83 1.86 1.88 200 300 400 500 600 700 800 900 1000 5.88 6.28 6.57 6.79 6.97 7.13 7.26 7.38 7.49 4.24 4.64 4.93 5.15 5.33 5.49 5.62 5.74 5.84 2.06 2.15 2.22 2.27 2.31 2.34 2.37 2.40 2.42 1,000,000 14.39 12.75 3.57 BREAKING RECORDS AND BREAKING BOARDS 1978] 15 The variance V(Rn) gives boundson the probabilityof destroyingtoo manyitems in a sequential strategyfor destructivetesting.In particular,a one-sidedChebyshevinequality([13] page 152)implies that i- rlV(R,) PtR > } 1 PIR, -V(Rn,) +[r - E(Rn)] For example, the values of E(Rn) and V(Rn) in Table C imply that P{Ri? 9} ' .14 and P{R oo' 18}' .05 (and actuallythese bounds are quite conservativesince detailed calculation[6] shows P{R. 9}= .022). In a later section I show that the binaryvariates Y,, Y2,Y3,... are independentas well as pairwise uncorrelated.Because of this independence,general limit theorems can be invoked to show more preciselyhow the random sum Rn Yi grows ptobabilisticallylike ln(n). I have noted that ,= n (i) E(Rn n E(Yi) = and that ln(n) approximatesthe expectation E(Rn) for large n, so that 1 V(Yn) [E(Rn) ]2 n [In(n) ]2 and consequently (ii) E(Y n n=1 [E(Rn< InvokingKolmogorov'sconvergencecriterionfor sums of independentrandomvariables([24] page 238), conditions (i) and (ii) imply that the sequence Rn/E(Rn) -*1; and hence Alfred Renyi [30] obtained the following "strong law of large numbers" for the frequency of record highs (since E(Rn )/Iln(n)-) 1): one as samplesize n --oo, Rn/Iln(n)- >1 withprobability which implies the divergence Rn->oo discussedin Section 7. Similarly,the criterion of Liapounov ([24] page 275) gives a "central limit theorem" for the frequency of record highs: as sample size n ->o, the limit distributionof (Rn- ln(n))/Vln(n) is normalwith mean = 0 and variance = 1. (Renyi also gave a "lawof the iteratedlogarithm."Resnick [33] laterderivedall these limit theoremsfromresultsconcerningcountsin a continuous-timePoisson process; see the end of Section 12, below.) But "the asymptotic distribution is a very poor approximationfor n - 1000say, which is the only region of much statisticalimportance"[4]. The exact probabilitythat Rn = r is complicated.Instead,considerR2n- Rn the numberof record highs over trials n + 1, n + 2,..., 2n. In particular, P{R2n- Rn = 0} = P{no record high for n + 1 ' i ? 2n} = P{ max (Xl, . . ., Xn )=max (Xl, . . ,X2n ) = n/2n = 1/2. In general,the count R2n- Rn has asymptoticallya Poissondistributionwith mean= ln(2): that is, for k =0,12, .... P{R2n- Rn = kj-> (2)1k asn[ln 2 X NED GLICK 16 [January One proof uses the probabilitygeneratingfunctionsgi (s) = E(s Y I)= 1+ (s - 1)/i associatedwith the binary variates Yi. Since R2 - R= Y.+1 + Y.+2 + +Y2n, and the binary variates are independent,this sum has generatingfunction given by 2n 2n hn(s) = Hl gi(s) = l [1+ (s-1)i]. i=n+l i=n+l Take logarithmsof both sides and use the Taylorseries expansionIn(1 + x) = x - x2/2 + x313- * to obtain 2n ln[l +(s -1)/i] In[hn(s)] = i =n [ = =(S-_ s 2i2 + i I 1) i=n+l i3 I sH 2 i=n+l sH i i=n+l As n a0, all of the above series tails vanishexcept for the harmonicinitialterm, whichconvergesto (s - 1)[ln(2n) - In(n)] = (s - 1)ln(2). This limitis the logarithmof a Poissongeneratingfunctionwith parameterIn(2)... . The argumentcan be generalizedto prove the followingtheoremfirststated by Dwass [10]: indexedbyan < i ' bn (for As samplesizen --oo, thefrequency of recordhighsamongobservations a Poissoncountwithmean ln(b/a). any b > a > 0) is asymptotically 9. Serialnumbersof recordbreakingtrials.RatherthanRn, the randomnumberof recordhighs in n trials,considernow the randomserial numberN, of the trialat whichthe rth recordhigh occurs. as a The numberN, is calledsimplythe rth recordvaluetime.SinceI countthe initialobservation recordvalue, R1 = 1 and N1 = 1. In general N, - r and, for n ' r, P{N, <n} = P{Rn> r}. In particular,the record time given by N2 has probabilitydistribution,over the integers i = 2,3,..., P{N2= i} = P{X, = max(X1,..., X-l < Xi i Since this probabilityis strictlydecreasingfor i = 2,3,..., the mode = 2; but the mean is E(N2)= , iP{N2=i}= _i 1=a. Since N2< N3< ..., it follows that E(N,) = 00 for all r ' 2. More surprising,an argumentin the next section shows that E(N,+i - N,) = a). To studyjoint distribution of N,, N,+i I must indicate why the binary variates Y1,Y2,..., Yn definedin the precedingsectionare independent (a conditionneededearlierto provethe Renyiand Dw'asslimit theorems). Consider the event that Y1,Y2,..., Yn include exactly r ones, at trials 1= il < i2< ... < i, c n. This event correspondsto max(X1,..., X,21) <max(Xi2,... 11 Xi ., Xi,3) <... 11 Xi2 which is the intersectionof independentevents of the form <max(Xi,,..., Xn), 11 xi, 1978] BREAKING RECORDS AND BREAKING BOARDS 17 max(Xl, . .., Xi-,) < max(Xi,. .., Xi) = Xi, with i = ikl and j = ik - 1 or j = n. This component probabilityis j-(i-l) 1 1 i j-(i-1) j' so the desired joint probabilityis the product (i2 - 1) i3- 1)*(ir - )n for Y2= 0, Y3= 0,..., Yi2-1=O It is easyto checkthatthe productof marginal probabilities Yi2+1 = 0,... gives the same expression: 1 2 3 i2-2 2 3 4-i2-l .1 i2 1 i2 i2+1 Yi2 = 1 1 (i2-1)(i3-1) The joint distributionof serial numbersN2, N3,..., Nr is therefore given by P{N2= i N3 i3* ** Nr= i} =(i2- 1) (i3-1) ***(i - 1)iri for 1 < i2 < i3 < ... < ir. The general marginaldistributionis complicated,but expressionshave been given by variousauthors[7, 19, 30, 37]: P{Nr= n}- = ISrn--l I/n!, wherethe numeratoruses Stirlingnumbersof the firstkind. Renyi [30] and David and Barton[7] used Stirlingnumbersto show that P{Rn = r} = P{Rn = r; N2 =i2, 2 .. 2:%i2<k3< =1 N3 = i3,.. . Nr = ir} n E n 2:,i2<i3< (r - cir, _1 ..*<ir:6n (i2 1) (i3 1) (ir 1) [I (n)r (n n) n(r-1) for largesamplesize n. But this approximationalso can be derivedusinggeneratingfunctionsas in the precedingsection. (Or see [21], page 267.) Renyi [30]treatedlimittheoremsfor Nras dualsof theoremsfor Rn.First,there is a "stronglaw of large numbers": In(Nr) r 1 withprobabilityone as r -aoo; or, equivalently, one. Nl'r e withprobability This last result says that almost every sequence of chance observations X1,X2,X3,... from a continuousdistributionwill give an arbitrarilypreciseestimateof the mathematicalconstante, even when the distributionfunctionis unknownto me! I need only the serial numbersof trials at which recordhighs occur. (This approachdifferssomewhatfrom statisticaldeterminationsof e or of ir in experimentssuch as "matching"or "Buffon'sneedle problem"[12]. In those situations,a particular event probabilityis a functionof e or of ir; and this probabilityis estimatedby the relativefrequency in repeated trials.) [January NED GLICK 18 Also Renyi [30] stated a "central limit theorem" for the random variables Nr: as r --* o, the distributionof (ln(Nr) - r)I\/r is asymptoticallynormal with mean = 0 and variance= 1. forN,. Resnick[33]againderivedRenyi'slimit Renyialsogavea "lawof the iteratedlogarithm" result.And Vervaat([44] page323)gave a theoremsfor recordtimesfroma moresophisticated of Renyi'sN,centrallimittheorem. Wienerprocessgeneralization of the secondrecordtime N2, describedat the beginningof this Returningto the distribution section,noticethatfor n = 2,3,4, .... P <2n ) P{N2 > n} = *E i=n+l (i lii=n+l P{N2 =i} Jn Vi1i N3 A similarresultholdsforrecordtime andconditionalprobability of the eventN3> n, giventhat the precedingrecord time N2= m. More generally, for n > m -' r, P{N+<m Nr=m =P{Nr+I>nlNr=m} = P{no recordhighfor m + 1 i 'n INr = m} = P{max (Xi,.. .,Xm )=max(Xi,...,Xn)INr = m} = m/n. by a rationalratiomr/n;andTata[43]gavethe Anyrealx in the interval(0,1) canbe approximated limit theorem: following The distributionof the ratio NrINr+iis asymptoticallyuniformover the unit interval: that is, for 0< x < 1, P{$ <x1-x as r-oo. section. Theproofsuggestsdualitybetweenthisresultandthe theoremof Dwassin thepreceding Giventhatthe rtbrecordhighoccursat trialN, the conditionalprobability P{N < Nr =P{Nr+i>2Nr,Nr} = P{no recordhighfor Nr+ 1 i -' 2N, IN,} = P{R2N,- RN, = O|N}Nr= Nr2Nr = 1/2, of N,. Similarly,for 0 < x < 1, independent P{Nr+<XjN =P{Nr+i> XN Nr [NrX as r -> oo, independentof Nr. (Here [N,/xJ means "the largest integer-- NrIx.") Theargument alsoindicates,asShorrock[37]andlaterResnick[33]proved,thatsuccessiveratios ... are asymptoticallyindependentuniformvariates.In other words, Shorrock's N,/N,+1, Nr+I/Nr+2, a uniform can be used to approximate theoremshowsthat an unknown continuousdistribution randomnumbergenerator!I needonlythe numbersof the trialsat whichrecordhighsoccurin the originalrandomlysampled X1,X2, X3,.... 1978] BREAKING RECORDS AND BREAKING BOARDS 19 Tata's argument requires the asymptotic probabilitythat the count (R[n,XJ- Rn) = 0. In the precedingsection,however,the theoremof Dwassgives more:the asymptoticPoissonprobabilitythat Rn)= k, for k = 0, 1, 2. (R[n/xTherefore, for any positive integer s and real x in the unit interval, p{ Nr <x N) >/ P{Nrs NrJ= < SIN} -RN P{R[N/ s-1 s-1 = k=O = -RN, PIR[N,x] k INI,} >x k=O (-In (x))'l k! as r -> 00, independent of N,. Thus a dual of the Dwass theorem generalizes Tata's result: for s=1,2,3,... andO<x<1, 4p$ii as r -*oo. <x1--* x>f(-ln(x))kIk ) k=O( ()/ {N,+s 10. Waiting times between record breaking trials. Now consider the wait from the rth to the (r + 1)Strecord breakingtrial: denote this inter-recordwaitingtime by Wr= Nr, - N,. Record breakingmust occur infinitelyoften; but it turns out that the inter-recordwaiting time distributionshave mean = 00, althoughmode = 1. These results,first proved by Chandler[6], can be deduced from independenceof the binary variates Y1, Y2,... via the following arguments. P{Wr= k IN;}j= P{Nr+l= Nr+ k INr}I = P{ Y, =O for Nr+ 1- i N +k-1,YN,+k= Nr + 1 Nr+ k-2 Nr+1 Nr+2 Nr+ k-1 Nr 1 Nr+k 1 | Nr} N, (Nr+ k-1)(Nr + k) This expression is monotone decreasingas k increases; so, for any value of N, P {Wr= 1 N,} P{Wr= k IN }, for k = 1, 2, 3,. Taking expectations,P{ Wr= 1}> P{W = k}; that is, the most probablevalue or mode = 1. P{N, = i, Wr= k} = P{N, = i, Nr+l= i + k} =?P{N2= 2, N3= 3,..., Nr 1= r-1,Nr = i, N+1= i + k} 1 (2)(3) (r-2)(i-1)(i+k-1)(i+k) 1 1 _ =r!(i-1) Vk+i-1 1 k +iJ and hence, for any integers m and i ' r, P{Nr= i r> } '! (i - 1) kl=m( k + i-1 k + i) 1 1 _ 1 r!(i-1)(i-1+m) i-l r!m 1 i-l+m The marginaldistributionof W4satisfies P{Wr m}l= P{N, = i Wr> m}l 1 r!.m r+m-1 =r i-l 1 r!mi= (- i m [January NED GLICK 20 and therefore, for arbitrarilylarge integers m, 00 E(W,)= > kP{W, = k}?- mP{W, -' m} k=1 1 r+m-1 =r! - i=r It follows from the divergenceof the harmonicseries that E(W,) = a). In other words, if I pay a fixed fee and receive in returna variabledollar amount equal to the waitingtime W, then my expectedgain is positiveand my gambleis not a "fairgame"no matterhow high the fee. An alternate demonstration that E(Wr) = 00 assumes that observations X1, X2, X3,... have exponentialdistributionF(x) = P{Xi < x} = 1 - e-X, for x > 0. This assumptionis.harmlessbecause the precedingsection shows that record times N, and hence inter-recordwaiting times Wr,have distributionswhich do not depend on F (only that it be continuous).The conditionalprobabilitythat waitingtime Wr> m, given that the rthrecordvalue XN,= x, is just the probabilitythat independent Xi <x for indices Nr + 1' i- Nr + m; viz., conditional probabilityis [F(x)]m = (1-e- x)m for exponential sampling. Moreover, Section 11 shows that the record value XN, from exponential samplinghas a gammadensity.Hence the unconditionalprobabilitythat Wr> m can be represented (independentof the distributionF) as the integralof (1 - e-x )m with respectto a gammadensity: P{Wr > m IXN, = x}dP{XN, P{Wr >m}= (1 - e-x = X} e-xdxI(r - 1)! )mxr-l or P{Wr = m}= P{Wr > m - 1}- P{Wr > m} J [(1_e-x )m1-(1- ex)m ]xrl e-xdxl(r - 1)!. Notice that the geometric series E m[(I-e-x)M-t- le-x )m= m=l E (I -e-x )m-l m=1 1- (1- ex)ex; so (as already showed by a differentargument)the expectation E(Wr) = , PW, =m}l=JXd (I am tryingto avoidintegralcalculusin this paper,but the precedingargumentis too ingenious to omit.) For large values of r, the gammacan be approximatedby a normaldensity. Neuts [27, 28] used such approximationin the integralexpressionfor P{ Wr> m } to prove a "law of large numbers"for waitingtimes:as r - , In(W) r 1978] 21 BREAKING RECORDS AND BREAKING BOARDS in probability(also: with probabilityone [20]). By the same method, Neuts gave a "centrallimit theorem":as r -* 00, the distributionof (in(Wr)- r)IVr is asymptoticallynormalwith mean = 0 and variance= 1. There is also a "law of the iterated logarithm"for waiting times [42]. Shorrock([36] page 222) and Resnick ([33] page 867) include these limit results in more general theorems for inter-recordwaiting times. The remarkablesimilaritybetween limit theoremsfor recordtimes and for inter-recordwaiting timessuggeststhatthe wait Wrfor the next recordvalue is somehowcomparablein scale to the sumof all pastwaitingtimesWI+ W2+ ' ' ' + Wri, = Nr - 1. Resnick[33]connectedthe limitbehaviourof Wrand Nr in the following theorem: with probabilityone, - In(Nr)L|=1 lim sup su Iln(Wr)(r) irn-- In Since Wr er in some probabilisticlimit sense, successive waits must have increasingmedian values, althoughthey all have mode = 1 and mean= mo.Numericcomputations[19] show that median(Wr+i e= 2.718 ... median(Wr) even for r = 455, 6,7, 8. Note that "smallr" does not mean "smallsamplesize": Table C shows that fewer than 8 record highs are expected in a sample of size n = 1000. r 2 3 4 5 6 median (W,) 4 10 26 69 183 490 1316 2.50 2.60 2.65 2.65 2.68 2.69 med(-W,)/med (W,_,) 7 8 The end of the precedingsection found independentuniformlimit distributionsfor NrINr+, and 00, the waiting time ratio Nr+iINr+2;and hence, as r -- Wr+i Wr Nr+1 (Nr+2INr+i) - i Nr+2 - Nr+1- Nr 1- (NrINr+i) asymptoticallyhas the same distributionas - 1 (1/V?- 1 Vor U L 1- U where U and V areindependentrandomvariablesuniformover the interval(0, 1). Forany x > 0, P{i , >x = =P{VV<(xU+1yl} =1 xdu J I dvdu ln(l+x) x JOxu + 1 x Thus Shorrock[37] obtained the following limit result for ratios of waiting times: for any x >0, { W ,x Jr -* x as r-oo. 11. The record value sequence. Little has been said so far about actual record values, i.e., magnitudes.Rather I have focused on frequenciesand serial positions of record values. Resultsconcerningthe recordfrequencyRn in a randomsampleX1,X2,. . ., X,, or concerningthe 22 NED [January GLICK index numberNr of the rth record value trial are distribution-freeresults:they presumeidentically distributed and independent (or exchangeable) random variables sampled from a continuous probabilitydistribution;but the distributionfunction F never appearsin the theorems. The actual record values X1< XN2< XN3 < XN4< ... present a quite contrarysituation.Results concerningthis recordvalue sequenceperse definitelydependon the particulardistributionF, but do not involve positions in the originalrandomsequence. Many propertiesof the recordvalue sequence from continuousdistributionF can be stated most convenientlyin terms of the transform G(x)= -ln[1- F(x)]. The derivativeG'(x) = F'(x)I[1 - F(x)] is the failurerate or hazardrate whichplaysa centralrole in mathematicaltheory of reliability(see [2]p. 53). Both F and G are strictlyincreasingfunctionsover their supportintervals,so there exist respectiveinverses F1 and G', G1 (x) = F-'(1- e-x ). Specificexampleswill illustratethe distinctionbetween distribution-freeresultsfor recordvalue timesin randomsamplingand distribution-dependent resultsfor the recordvalue sequenceitself. The theoremof Dwass at the end of Section 8 assertsthat, for 0 < a < b, the count of recordvalue trials indexed between an and bn has asymptoticallya Poisson distributionwith mean In(b/a). This result clearlyis asymptotic(validas n -m oo)since there can be at most (b - a)n recordindicesbetween an and bn, while Poisson distributionpermitsan arbitrarilylarge frequency.By contrast,for any a <,p < ** can take values such that0 < F(a) < F(/3) < 1, arbitrarilymany of the recordsX1 < XN2< XN3 between a and P3.Dwass [10] proved that this count is exactly Poisson distributed,but with mean G(P)- G(a) = - ln{[1 - F(f3)]/[1 - F(a)]}, dependingon the distributionfunction F. The fact that exact distributionsare obtainedmore easily than most limit resultsfor recordvalues is a furthercontrastto results for record value times. The distributionof the rthrecordvalue was requiredin the precedingsection'scalculusargument concerningwaitingtime W,.Let positiveintegerswl, w2,.. ., w,-, denote fixedwaits,and specifyfixed Consider a random sample X1,..., trial numbers n2 =1+w1, n3=n2 +w2,..., n,=n, l-+w,l. Xn2,..., X..,... such that Xi = x1> next (w, - 1) observations Xn2= x2 > next (w2 - 1) observations X., = X,. withP{Xi <x} = F(x), the event SinceX1,X2,X3,... are independentand identicallydistributed above has probability dF(x1)[F(x)] (w 1) x dF(X2) [F(X2)](w21) ? dF(xr-1) [F(xr-1)]' Il x dF(Xr), where dF(x) = F'(x)dx and the derivativeF' is a probabilitydensityfunction.For increasingvalues XI <X2< .. < Xr, the event above is equivalentto BREAKING RECORDS AND BREAKING BOARDS 1978] 23 W1= W1, X1=X1, XN2 = X2, W2 = W2, X__ = XN, = Xr Xr-i, Wr-i = Wr-i, Hence summationof the foregoingprobabilityelement over all possible waitingtimes gives the joint probabilitythat XI = x, XN2= x2,.. ., XN,= Xr.This sum can be expressedas a productof geometric series: .. dF(xrl)[F(xr1)](wr-l-')dF(xr) ## * # dF(x1)[F(x1)J(",-1) Wj=I W2=1 W_1=1 "I... dF(xr1) E. [F(xrl)]w,-,dF(x,) =dF(x,)wE[F(xi)] l=() W_-l-? - dF(x1). = dG(x,) dF(xr-) dF(xr) .dG(x- )dF(xr). Iteratedintegrationwith respectto x,, x2,. has probabilityelement ., X,- dP{XNJ = Xr} = (over the region x < x2 <K* < xr) shows that XN, G [( iix dF(xr). This argumentwas suggested by Karlin in a textbook exercise ([21] pages 267-268; see also [31] page 69). If observations X1,X2,X3,... have exponential distribution F(x) = 1 - e-x and G(x) = -ln[1 - F(x)] = x, then the rIbrecord value has gamma probabilitydensity Xr-l e-xI(r - 1)!, as asserted in the precedingsection. Moreover, for X from any continuous distributionF, it is easy to show that the transformed variableF(X) has uniformdistributionon the unit interval([21] page 237) and G(X) has standard exponential distribution.Since G is strictly increasingover its support, the record value trials in randomsamplingX1,X2,XA3, ... from F correspondto recordvalue trialsin a sample G(X1), G(X2), G(X3),... from the exponential distribution.Thus transformedrecord value G(XN,) has gamma distributionwith r degrees of freedom. For large r the gammadistributionis approximatelynormal. Thus Resnick [31] obtaineda "centrallimit theorem":as r-oo, the distributionof (G(XNr) - r)IV/r is asymptoticallynormal with mean = 0 and variance= 1. The corresponding"strong law of large numbers"asserts that G (XNr)Ir -)/ 1 withprobabilityone as r -m oo. Resnick [31] (also see Shorrock [37]) gave explicit conditions on the function G which are necessaryand sufficientfor convergencesin probability XN,- G (r)-*O, XNr/G1(r)- 1. But also there are conditions under which the last ratio converges to a non-degeneraterandom variableratherthan to a constant.Indeed, Resnick [31] characterizedthe types of limit distributions for record values XNr; dependingon the sampled distributionfunction F, a record value sequence satisfiesexactly one of the following convergencesin distributionas r -- 00: [January NED GLICK 24 W (ii) p-{G P{G-'(r) xN, (r+r< <r x } i x-'O X [a In(x)] <x ~ ~ ~ x? x x'O XN,~~~~~~1 where X denotes the standardnormaldistributionfunction,a is a positiveconstantdependingon F, and x in (iii) is the upper (necessarilyfinite) endpoint of the support intervalof distributionF. I omit more precisestatementsof limit theoremsfor recordvalue sequencesbecausethese results involve complicatedconditionson F or G. For details see Resnick [31] and also [9, 16, 32, 34, 35, 36, 37, 38, 39, 40, 43, 44]. 12. Extreme values and extremal processes.Every random sample X1,XA2,...,X,, has a sample maximum or upperextremevalue M,, = max(Xl, X2,.. ., X,,). Clearly M1= X1 and subsequently every new maximumis a recordvalue, i.e., recordbreakingcorrespondsto a jump or strictinequality Mn< Mn,+in the sequence of sample maximaXI -' M2 -<M3-' M4 ' *- - . Thusa recordvalue sequencecan be extractedfroma maximalsequencewhichis alreadyremoved fromthe randomsequence;but the conversepathis impossible.So maximalsequencesmightlogically be studied before record value sequences and such was the historicalprecedence, contraryto my arrangementof topics.Gumbel[18]has given the earlyhistoryandextensivebibliographyon statistics of extremes, includingdistinctiveapplications:for example, using a river'spast flood levels (annual maximaof dailyobservations)to plan dams,etc., with sensibleallowancefor worsefloodsin the future (see [18], page 236, for analysisof MississippiRiver flooding at Vicksburg). Historically,studiesof samplemaximaalso have been concomitantto the more generalsubjectof orderstatistics[8], viz., randomlysampleddata filed or rankedfromsmallestvalue to largestvalue. In the terms of statistical decision theory, ranked data constitute "sufficient statistics" both for "classical" procedures and for most "nonparametric"methods (of which the tolerance limit constructionin Section 6 is one example).Unlike the frequencyof recordbreaking,commonsample statistics,such as mean or medianannualrainfall,can be computedfromordereddata as well as from values in their randomsamplingsequence (in fact, rankingis generally the fastest way to find the median or other sample percentiles). In 1943B. V. Gnedenko [17] showed that there are preciselythree types of limit distributionsfor sequences of sample maxima. Gnedenko's three types of extremevalue distributionscorrespond exactly to the three limit laws for record values publishedby Resnick [31] in 1973 and mentioned brieflyin the precedingsection. The limit laws for maximaand for record values from continuous distributionF are linked by Resnick in a duality theorem. That is, the different extreme value distributionspartitionthe space of all continuous distributionfunctions into disjoint "domainsof attraction,"and recordvalue distributionsdetermineexactly the same partition.See [321for further comparisonof record values and maxima. * * * A sequence of sample maximaMnplotted as a functionof sample size n can be regarded(see Shorrock[39]) as a discrete-timeMarkov process with jumps at record value times N,. Common techniques in stochastic process theory can be used to construct an analogous continuous-time 1978] 25 BREAKING RECORDS AND BREAKING BOARDS Markov jump process M(t) such that, for any times 0 ' tl < t2 < . . . < tn, the variables M(t,) ' <M(t ) have the same joint distributionas sample maxima M, <M2 M(t2) M,. The studyof such continuous-timeextremalprocessesseems to have been originatedby Dwass [10, 11] and Lamperti[23] in the mid-1960s. Dwass used recordvalue theoremsto "motivatesome of the results"for extremalprocesses.But, conversely,Shorrock[38, 39] and Resnick[33,34, 35] used continuous-timeprocessesto obtainresults for record values. In particular,Resnick [33] showed that a continuous-timeextremalprocess M(t) with jump times accordingto a non-homogeneousPoisson process(with intensityt-') has a "discrete skeleton" M(n), n = 1,2,3,..., whose jump times behave precisely as record value times (for n sufficientlylarge). Thus Resnick [33] used the frameworkof extremal processes "to give a unified explanationof known limit laws" for recordfrequenciesRnfor recordvalue trial numbersNr, and for inter-record waiting times Wr. _ . This study was supported by the National Research Council of Canada, Grant A8044. References Asterisks indicate references most relevant to the theory of record values. 1. American Society for Testing and Materials (1970) "Tentative method for evaluating allowable properties for grades of structural lumber". 1916 Race Street, Philadelphia, Pa. 19103. 2. R. E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, Holt, Rinehart & Winston, New York, 1975. 3.* D. E. Barton and C. L. Mallows, The randomization bases of the problems of the amalgamation of weighted means, J. Roy. Statist. Soc., Ser. B, 23 (1961) 423-433. ~ and , Some aspects of the random sequence, Ann. Math. Statist., 36 (1965) 236-260. 4.* 5. British Columbia Department of Agriculture, Climate of British Columbia: tables of temperature, precipitation, and sunshine, Annual reports for 1960-1974. Victoria, B.C., Canada. 6.* K. N. Chandler, The distribution and frequency of record values, J. Roy. Statist. Soc., Ser. B, 14 (1952) 220-228. 7.* F. N. David and D. E. Barton, Combinatorial Chance, Hafner, New York, 1962, pp. 178-183. 8. H. A. David, Order Statistics, Wiley, New York, 1970. 9. L. de Haan and S. I. Resnick, Almost sure limit points of record values, J. Appl. Probability, 10 (1973) 528-542. 10.* M. Dwass, Extremal processes, Ann. Math. Statist., 35 (1964) 1718-1725. 11. , Extremal processes, II. Illinois J. Math., 10 (1966) 381-391. 12. W. Feller, An Introduction to Probability Theory and Its Applications, Volume I, 3rd ed. Wiley, New York, 1968. 13.* , An Introduction to Probability Theory and Its Applications, Volume II, 2nd ed., Wiley, New York, 1971. 14.* F. G. Foster and A. Stuart, Distribution-free tests in time-series based on the breaking of records, with discussion, J. Roy. Statist. Soc., Ser. B, 16 (1954) 1-22. 15.* F. G. Foster and D. Teichroew, A sampling experiment on the powers of the records tests for trend in a time series, J. Roy. Statist. Soc., Ser. B, 17 (1955) 115-121. 16.* W. Freudenberg and D. Szynal, Limit laws for a random number of record values, Bull. Acad. Polon. Sci. Ser. Sci. Math., Astr. Phys., 24 (1976) 193-199. 17. B. V. Gnedenko, Sur la distribution limite du terme maximum d'une serie aleatoire, Ann. of Math., 44 (1943) 423-453. 18. E. J. Gumbel, Statistics of Extremes, Columbia University Press, New York, 1958. 19.* D. Haghighi-Talab and C. Wright, On the distribution of records in a finite sequence of observations, with an application to a road traffic problem, J. Appl. Probability, 10 (1973) 556-571. 20.* P. T. Holmes and W. E. Strawderman, A note on the waiting times between record observations, J. Appl. Probability, 6 (1969) 711-714. 21.* S. Karlin, A First Course in Stochastic Processes, Chapter 9. Academic Press, New York, 1966. 22. W. Kruskal, Statistics, Moliere and Henry Adams, American Scientist, 55 (1967) 416-428. 23. J. Lamporti, On extreme order statistics, Ann. Math. Statist., 35 (1964) 1726-1737. 24. M. Loeve, Probability Theory, 3rd ed. Van Nostrand, Princeton, N.J., 1963. 26 HILLMAN-ALEXANDERSON-KLOSINSKI [January 25. Meteorological Branch, Annual meteorological summary with comparative data: Vancouver Airport 1937-1964; Vancouver City 1900-1964. Department of Transport, Ottawa, Canada, 1964. 26. R. B. Murphy, Non-parametric tolerance limits, Ann. Math. Statist., 19 (1948) 581-589. 27.* M. Neuts, Waiting times between record observations, J. Appl. Probability, 4 (1967) 206-208. , Probability, Allyn & Bacon, Boston, 1973, pp. 299-300, 432. 28.* 29.* G. F. Newell, A theory of platoon formation in tunnel traffic, Operations Res., 7 (1959) 589-598. 30.* A. Renyi, Theorie des elements saillants d'une suite d'observations, with summary in English. Colloquium on Combinatorial Methods in Probability Theory, (1962) 104-117. Matematisk Institut, Aarhus Universitet, Denmark. 31.* S. I. Resnick, Limit laws for record values, Stochastic Processes and Their Applications, 1 (1973) 67-82. , Record values and maxima, Annals of Prob., 1 (1973) 650-662. 32.* , Extremal processes and record value times, J. Appl. Probability, 10 (1973) 864-868. 33.* , Weak convergence to extremal processes, Annals of Prob., 3 (1975) 951-960. 34.* 35.* S. I. Resnick and M. Rubinovitch, The structure of extremal processes, Advances in Appl. Probability, 5 (1973) 287-307. 36.* R. W. Shorrock, A limit theorem for inter-record times, J. Appl. Probability, 9 (1972) 219-223. Correction on page 877. , On record values and record times, J. Appl. Probability, 9 (1972) 316-326. 37.* , Record values and inter-record times, J. Appl. Probability, 10 (1973) 543-555. 38.* , On discrete time extremal processes, Advances in Appl. Probability, 6 (1974) 580-592. 39.* 40.* M. M. Siddiqui and R. W. Biondini, The joint distribution of record values and inter-record times, Annals of Prob., 3 (1975) 1012-1013. 41. P. N. Somerville, Tables for obtaining non-parametric tolerance limits, Ann. Math. Statist., 29 (1958) 599-601. 42.* W. E. Strawderman and P. T. Holmes, On the law of the iterated logarithm for inter-record times, J. Appl. Probability, 7 (1970) 432-439. 43.* M. N. Tata, On outstanding values in a sequence of random variables, Zeitschrift fir Wahrscheinlichkeitstheorie und verwandte Gebiete, 12 (1969) 9-20. 44.* W. Vervaat, Limit theorems for records from discrete distributions, Stochastic Processes and Their Applications, 1 (1973) 317-334. 45. S. S. Wilks, Determination of sample sizes for setting tolerance limits, Ann. Math. Statist., 12 (1941) 91-96. DEPARTMENT OF MATHEMATICS, AND DEPARTMENT OF HEALTH CARE AND EPIDEMIOLOGY, UNIVERSITY OF BRITISH COLUMBIA, VANCOUVER, BR. COLUMBIA V6T 1W5, CANADA. THE WILLIAMLOWELLPUTNAMMATHEMATICALCOMPETITION A. P. HILLMAN, G. L. ALEXANDERSON, L. F. KLOSINSKI The following results of the thirty-seventhWilliamLowell PutnamMathematicalCompetition, held on December4, 1976,have been determinedin accordancewith the governingregulations.This annual contest is supported by the William Lowell Putnam Prize Fund for the Promotion of Scholarshipleft by Mrs. Putnamin memory of her husbandand is held under the auspicesof the MathematicalAssociation of America. The first prize, five hundred dollars, was awardedto the Department of Mathematicsof the CaliforniaInstitute of Technology,Pasadena, California.The members of its winning team were ChristopherL. Henley, Karl W. Heuer, and Albert L. Wells, Jr.; each was awardeda prize of one hundreddollars. The second prize, four hundred dollars, was awarded to the Department of Mathematicsof

Breaking records and breaking boards

Related documents

Products

Support

Breaking records and breaking boards

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib