Looking Backwards to the Future Tony Lawrance Department of Statistics

advertisement
Looking Backwards to the
Future
Tony Lawrance
Department of Statistics
University of Warwick
1
First of all, sincere thanks for making this such a great day for me (provisional remark…)
Especially –
John
Theodore
and thanks to the Statistics Department for ‘sponsoring’ the event
2
Looking backwards to the future – what does it mean ?
An excuse to briefly look back on an enjoyable time in statistics with a wish to also
look forward to some more time in statistics… Will try and pin the talk on some
significant and not so significant events in my statistics life
Nearly 40 years of statistics before Warwick – so some reminiscing here for the first
time here may be acceptable…
In Warwick for just less 10 years – but very enjoyable ones
Most of my publications are now on the site ‘researchgate.net’
Diary of Life
Maths undergraduate in Leicester – graduated 1963
‘Intimidated’ into statistics by Nageeb Rahman, a Cambridge PhD student of Henry
Daniels – in that, I am the two-year elder ‘statistical brother’ of Phil Brown
Nageeb sent me in 1963 to Aberystwyth for an MSc (and then Phil Brown in 1965)
because Dennis Lindley from Cambridge had started a Stats Department there in
1960, with David Bartholomew, Mervyn Stone and Ann Mitchell (Dennis was in
Harvard for half my year, but taught frequentist inference in the second term)
3
Department of Statistics, Aberystwyth 1963-64
GwynJones MikeSamworth PgslyGwynne GrahamPhipp ^ JeffWood ClivePayne ?Bambegye BasilSpringer ErylBasset RichdMorton
Carol? DonaldEast SylviaLutkins DavidBartholomew DennisLindley MervynStone AnnMitchell PeterKing Eileen?
4
The IBM 1620 Electronic Computer, Aberystwyth Stats Dept 1963
Out of bounds to MSc students
5
Diary of Life
After MSc Leicester October 1964 - started as a tutorial assistant 1 year -> assistant lecturer
Frank Downton, d 1986 ?
Nageeb Rahman, d 90’s ?
Mike Phillips – 1968-…
Brian English – 1969-70?
Took 4 ‘summers’ to get a PhD, Stochastic Point Processes’, awarded in 1969.
Started by Frank Downton giving me a sheet with a few references …
To 7
Lightly supervised by Frank Downton, who almost immediately after my arrival back
in Leicester moved to Birmingham, enticed by Henry Daniels
Never-the-less, Frank Downton had big influence encouraging me, research
confidence building…
Another big influence in supporting my career was my external examiner David Cox
So this seems a good point to get a bit more technical
6
(back to 6)
7
PhD and Point Processes…
Time series of point events on the line – mainly Poisson and renewal processes at the
time – spatial or dependent interval versions had not been much considered
time
I went for dependent interval versions with stationarity and first studied Cox’s 1954
Biometrika paper on ‘superposition of renewal processes’ or ‘pooled processes’
Process 1
Process 2
Superposition
What was the inter-point distribution and dependency of this process ?
My first issue was what was meant by a ‘typical event’ to start an interval in a
stationary point process ?
I wrote to David Cox – good question, he said ! “We have avoided it in my just
completed Methuen monograph with Peter Lewis” on ‘Series of Events’ – 1966
(I hope my memory is correct !)
So after a while I investigated two ideas…
8
An Average Event – an interval beginning with an ‘average event’ in the
stationary PP with intervals
X 1 , X 2 ,...
n
has distribution
1
P  X  x   n lim
  n  P  X i  x
i 1
…a bit clunky
An Arbitrary Event - a more elegant approach follows from Khintchine’s (1955) (To 10)
work** on stationary input processes for queues**. This developed from ‘Palm
distributions’, referencing Palm (1943) , who introduced the idea of an interval
beginning with ‘at least one point’ in a telephone queuing context
Thus, with N (t , t   ) the counting variable in a stationary point process, the
definition of the distribution of an interval beginning with an arbitrary event is
P  X  x    lim
 0 P  N ( ,  x)  0 | N (0, )  1)
It turned out that this definition mathematically connected the idea of an arbitrary
event with that of an arbitrary time, and involved length-biased sampling and
forward and backward recurrence times – previously informal concepts for a
general stationary point process
My thesis work also contained work on this arbitrary event approach and on
particular point processes…
9
Khintchine (1894-1959). Mathematical Methods of Queuing, 1955, English Eds,
1960, 1969, Griffin
From the introduction…
.
(back to 9)10
Diary of Life
My First Seminar was 25 Feb 1970 at UMIST, Manchester, on ‘selective
interaction of point processes’, one of my PhD point processes
My Most Recent Seminar reconstructed part of my first seminar at the Maurice
Priestley memorial meeting, 18 December 2013…
The selective interaction model was introduced by the Dutch neurophysiologists Ten
Hoopen and Reuver (1965, 1967) to explain multi-modal inter-spike distributions for
dark firing of lateral geniculate neurons, observed by Bishop et al (1964)
The process can be explained as follows - you can see that I was rather keen on
graphics even in those distant days…
(from my thesis)
I explored it as an applied probability model. I really wish now that I had followed up on
the statistical aspects, contacting the experimenters, analysing their data, attempting to
collaborate, etc, and doing simulations – but there was little electronic computing and
11
no internet, and Holland was a long way away
from Priestley meeting talk
The Selective Interaction Neuron Firing Point Process Model
Excitatory
stnry stoc
pnt count
process
This image cannot currently be display ed.
Inhibitory
I i , I ( y )
stnry
Interval
process
Observed
Response
Selective interaction process
This image cannot currently be display ed.
The model was justified empirically by a multi-modal distribution of times between
the responses’, in the ‘spike trains’ of observed neuron firings – convolutions of
excitatory intervals
Poisson excitatory results by very detailed calculation – in my thesis
General results by appealing to the compound distribution structure of the observed
response count, resulting in
N R (t )  N E (t ) 
NI (t )

i 1
i
E ,I
,  Ei , I  1 with prob  P{N E ( I i )  0}  1  P{N E ( Ii )  0},  0 otherwise
12
Continued,
N R (t )  N E (t ) 
(J Appl Prob papers 1970-71 &1979)
NI (t )

i 1
i
E ,I
,  Ei , I  1 with prob  P{N E ( I i )  0}  1  P{N E ( Ii )  0},  0 otherwise
Excitatory
stationary
stoc pt
count
process
N E (t )
Inhibitory
I i , I ( y )
stationary
interval
process
Selective interaction process
It follows
Response
N R (t )


E{N R (t )}  E  I  Pr{N E ( y )  1} I ( y )dy  t


y 0
and approximately (?) via compound distribution results
var{N R (t )}  [ E   I E ( )  I var( E , I )]t
sdevs  E ,  I
Compounding the exciting process intervals using the inhibitory process to get the
inter-response distribution is more difficult…but I used arbitrary events
For more detailed results when the excitatory process is Poisson, see my 4 JAP
papers in the 70’s. No model fitting, no simulations – what a pity !
met Valery Isham,
Anthony Atkinson
at IC
Diary Life
1970 – After PhD exam joined David Cox’s weekly PP journal club at IC from Leicester
1970 - Next move - the year 970/71 at the ‘IBM Thomas J Watson Research
Center’, New York, invited by Peter Lewis
Extended and consolidated PhD work by investigating branching Poisson process
point models for computer failures, and co-organizing big point process conference
1972 – Returned to Leicester for 1 year – moved to Birmingham for
25 years
1973-2004 My Birmingham Years
Henry Daniels
David Wishart
Paul Davies
Phil Bertram
Roger Holder
Frank Downton
Malcolm Faddy
Alan Girling
John Copas
Chris Jones
Richard Atkinson
Frank Critchley
Prakash Patil
Christmas Meal 1981/82
PhilB? FrankD ? Chris Gray AJL AnnieM ChrisJ TriciaC
14
Birmingham Group (when MalcolmF moved back to NZ for second time, 2003)
KamilaZ
WolfgangB
AlanG
PrakashP SaidS MalcolmF
RichardA
15
1973 – Farewell Point Processes
Found research opportunities in hydrology (from teaching with Nath Kottegoda in
Civil Engineering) after devising a course in hydrological time series for Bham MSc in
Hydrology
RSS Read Paper on the topic with Nath Kottegoda (Stochastic Modelling of Riverflow Time Series)
Examined Jane’s PhD on ‘dry’ rivers…
Teaching has influenced my ‘choice’ of research areas quite a bit but not the
reverse
1973 – Hello Time Series – as it was moving into the nonlinear era
Time series started to move away in several directions from ‘Box-Jenkins’ linear
Gaussian models to be able to capture more statistically varied and complex
behaviour
Maurice Priestley, with non-stationary processes and spectra
Howell Tong, with dynamical-statistical thresholds
Robert Engle, Clive Granger, with volatility, co-integration
Peter Lewis et al, with specified nonGaussian models, including discrete distribution
models, simulation in operations research
1980-1990 Worked on non-Gaussian time series models with Peter Lewis, by then
at Naval Postgraduate School, Monterey, California (nice summers)
16
Peter Lewis, 1932-2011
17
1978-80 – Work started with nonGaussian solutions to linear time series models,
exponential, mixed exponential, gamma
1980-87 - Then ways to formulate autoregression operation with nonGaussian
variables – in ways natural to the particular distribution, e.g. convolution and
multiplication, minimization
1989-90 – Non-reversibility, directionality, in nonGaussian linear time series
An early linear problem – it’s easy to set up …(so I describe here)
The AR(1) Innovation Problem
How to specify the error distribution for an AR(1) process with specified marginal
distribution
X t   X t 1   t
Gaver & Lewis made a start with the gamma distribution but could not explicitly obtain
the innovation distribution..…
18
The AR(1) Innovation Problem – ‘epsilon for given X’
X t   X t 1   t , 0    1, X t  D,  t  ??distbn
Solution easy in terms of Laplace transforms – Gaver & Lewis, from
 X ( z )   X (  z ) ( z ),  ( z )   X ( z ) X (  z )
 0 with proby 
Exponential( ) solution clear:  t  
 Et ( ) with proby 1  
  
X ( z )  

z
k
Gamma solution –>
   z 
 ( z )  



z


k
Can you invert this Lapalce transform without serendipity ?
‘Consider a shot noise process in continuous time’, of course…
N
 t    Yi
Ui
i 1
N  Poisson(k log  )
1
U i  uniform(0,1)
Yi  exponential ( )
A compound Poisson distribution
19
Diary Life
1985 - RSS ‘read paper’ on nonlinear AR exponential variables, with Peter Lewis
1986 – ISI Tashkent - Very Sick !
(Time series directionality)
1986 - Began teaching inference in Bham - beginning of regression diagnostics
1986 – Seconded RSS vote of thanks at Cook’s 1986 local influence read paper, and
showed how it applied to regression transformation diagnostics
1988 - JASA paper on regression transformation local influence diagnostics
(To 21, 22)
1988 Got chair in Bham (poster of inaugural lecture)
1989 – Papers on regression transformation score statistics Biometrika papers 1987, 1989-ACA
1991 - IMA Minnesota Robustness & Diagnostics workshop
(photos Anthony, Frank)(To
23, 24)
1981-1991 Tim Davis PhD collaboration ‘Survival of Tyres’, Dunlop-Sumitomo-Ford
1991 Tim Davis PhD
1995 - Regression diagnostics – Cook’s bivariate & conditional distance
Gary Brown PhD 1995
1995 - 98 Engine mapping, with Tim Davis, Tim Holiday- PhD-1996
Technometrics paper 1998
1992- Statistical aspects of chaos
1998- Chaos-based communications
took over my research & publication
(To 25)
20
21
(Back to 20)
(Back to 20)
A trip across Minnesota and Iowa with Anthony Atkinson and Frank Critchley to
Spillville, Iowa, to visit Dvorak connections, 1991, on the workshop rest day…
(To 24)
23
Dvorak’s ‘American Quartet’
(String Quartet in F Major,
op96) composed here in
1893, also, String Quintet in
E Flat Major,op97
(sometimes called the
‘Spillville Quintet’), and after
returning to NY, his
Humoresque, No 7 in G Flat
Major
Spillville, Iowa 1991
25
Back to 20
1992- 2010 Statistical aspects of chaos, leading to
Chaos-based communications
Chaos – instabilities produced by a deterministic rule
‘What got me started’… the Uniform Distribution Solution to the AR(1)
Process – Bartlett’s last paper, probably (another case of the AR(1) innovation problem)
Ut 
1
U t 1   t ,
k
t 
i 1
1
wp , i  1, 2,..., k
k
k
Collaborators
Bala Balakrishna
Alexander Baranovsky
Tohru Khoda
Gan Ohama
Rodney Wolf
Theodore Papamarkou
Nancy Spencer
Atsushi Uchida
Chibisi Chima-Okereke
25
1
U t  U t 1   t ,
k
i 1
1
t 
wp , i  1, 2,..., k
k
k
Where is the chaos from this model?
The reverse of this model is the following chaotic and deterministic model
U t 1  (kU t ) mod(1)
deterministic rule called a
chaotic shift map ~ like cnts
congruential random number
generator
And, incidentally, there is a negatively correlated version reversing to
U t 1  {k (1  U t )}mod(1)
It follows (and more) generally that deterministic chaotic processes have statistical
properties, i.e., there are statistical properties of chaos
Such ideas prompted some electronic engineers to have the idea of ‘communicating
with chaos’ – instead of communicating with sinusoidal radio waves
26
A particular chaos communication system using a chaotic map is
Chaos Shift Keying (CSK) – ‘Coherent’ Case -simplest
Channel Noise
Transmit one
bit b=+/- 1
 i i1
n
Received Signal
Ri    b( X i   )   i
Chaotic
Spreading
X i   ( X i 1 )
n
 X i i1
i  1, 2,..., n
Signal
  b( X i   )
i  1,2,..., n
Also available in
coherent case
 X i i1
n
Decoder
bit =
Exact theory for bit error rate of such a system,

d

BER( N )     
x c


i1
n
b̂
Lawrance & Ohama (IEEE, 2002)

( x)    
f ( x)dx




( i 1)
2
(estimate b)
Performance of CSK
Assessed by bit error rate (BER)
Depends on statistical aspects of the system as well as the dynamics, according to
previous formula
Worst; IID Gaussian
Different types of chaotic
spreading, compared to
IID Gaussian
Shift map
Logistic map
Best:
circular map and
theoretical lower bound
Optimum circular map
spreading: Ji Yao, T
Papamarkou
Area has moved on from chaotic-map and electronic circuitry chaos to laserchaos communication; this is still a research area but with several experimental
demonstrations and US military applications
28
Current work with Atsushi Uchida and Chibisi Chima-Okereke
-> Police mergers
A Brief Diversion - In the Press…
Police Mergers 2006 – the misuse of statistics
80.00
Line equates
to an average
score of 3
score = 3
70.00
Total Score by Force (excluding London)
63
60.00
Total Score
50.00
40.00
30.00
20.00
10.00
p is significant to the 0.01 level
R2 = 0.5809
4000
0.00
0
1000
2000
3000
4000
5000
Force Size (Officer Strength)
6000
7000
8000
9000
Charles Clarke.
Home Secretary
Government O’Connor Report said : This strongly suggests that forces with over
4,000 officers (or 6,000 total staff) tend to meet the standard across the range of
services measures in that they demonstrate good reactive capability with a clear
measure of proactive capacity…’.
29
What I said about the O’Connor Report:
80.00
Total Score by Force (excluding London)
Line equates
to an average
score of 3
score = 3
70.00
63
60.00
Total Score
50.00
40.00
30.00
20.00
What this plot shows to me is:
10.00
p is significant to the 0.01 level
R2 = 0.5809
0.00
Rather rough upward scatter of points
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Force Size (Officer Strength)
Least squares line is misleading because of extremes
Large variability at each force size – very important
Line at 63 shows most forces ‘fail’ – artifact of scoring and choice of ‘3’
Meaningless statistical elaborations of p-value and R-squared due to automatic use of
software
No justification of 4,000 figure
30
Another example of rubbish in the O’Connor report
What I said about this plot ‘This is an almost perfect example of how not to present a
graph - no scales on either axis, no data plotted to justify the lines drawn. It is almost
impossible to obtain any critical understanding from it, except that it is intended to
prove that score for protective capability increases with force size’
Score
Overall Trend for Protective Services
Ser i ous &Or gani sed
Publ i c Or der
Cr i ti cal Inci dents
Ci vi l Conti ngenci es
Roads Pol i ci ng
Maj or Cr i me
CT &DE
F o r c e S i z e (Smallest f rom lef t )
31
What was said in the House of Commons:
MP David Davis: ….Frankly, the best that I can do is to repeat to the House the
coruscating opinion of Professor Lawrance, a professor of statistics at Warwick
University…
MP Adrian Baily: …I rather regret the attempt by the University of Warwick to
rubbish the statistical basis and the credibility of that report. It has a good
pedigree and I shall make my judgement on the balance of professional police
opinion, rather than on the opinion of university professors in Warwick…
Another newspaper appearance…
32
A Publication in ‘The Sun’… - 14th October 2013
33
A Publication in ‘The Sun’… - 14th October 2013
A MATHS professor has told The Sun bills are so complicated even he can’t understand them. Tony Lawrance,
right, of Warwick University said : “They’re absurdly over-complicated. Most professors would find them difficult
to understand – the public doesn’t stand a chance.’’
34
Chaos-based Communications 2001 – 2014 - ??
Collaborators:
With Bala
Balakrishna,
Cochin
University
Kerala
75%
Bala Balakrishna
Gan Ohama
Rachel Hilliam
Yi Yao
Theodore Papamarkou
Chibisi Chima-Okereke
Atsushi Uchida
Current work|: laser-chaos-based communications
(laser = light amplification by stimulated emission of radiation)
Key laser features of laser-based communication
1. Lasers can produce chaotic waves
which look stochastic – (use semiconductor laser with optical feedback)
2. Lasers producing chaotic behaviour
can be synchronized by a trigger signal
A message is hidden in a segment of the chaotic laser sequence - steganography,
rather than cryptography when a message is visible but has to be decoded
35
Current Work-1: Laser-based Chaos Communication
Experimental data via collaboration with Atsushi Uchida, Saitama University,
Tokyo, and analysis collaboration with Chibisi Chima-Okereke of
ActiveAnalytics, Bristol
Experiment set up to probe chaos shift-keying system of communication using semiconductor lasers with optical feedback and transmission though 60m fibre optic cable
Each set of data consists of three time series of 10m values
binary
message b
and
binary
message
36
Experimental setup not quite so simple as it may have seemed…
37
Some Experimental Results
Adjusted Received and Synchronized Laser Signals (5,000,001:1,000,500)
0.3
0.2
0.0
-0.1
-0.2
-0.3
-0.4
-0.5
Example of laser synchronization
-0.6
0
100
200
300
400
500
Time Index - 5m
Intensity of Drive Laser
Adjusted Optical Noise
12
16
14
10
12
8
10
Density
Density
a djD rv_wO pt N se _1
0.1
6
4
8
6
4
2
2
0
-0.8 -0.7 -0.6
-0.5 -0.4 -0.3 -0.2 -0.1
Intensity
0.0
Drive laser
0.1
0.2
0.3
0
-0.8 -0.7 -0.6
-0.5 -0.4 -0.3 -0.2 -0.1
Intensity
0.0
0.1
0.2
0.3
Is Optical Noise
Independent ?
Optical Noise
Based on post-processing for instrument effects – Noise not Gaussian
38
Distribution of Optical Noise Conditional on Driver Signal Strength
Boxplots of Optical Noise versus Drive Signal Strength
0.15
Noise Boxplots
0.10
0.05
0.00
-0.05
-0.10
15 05 95 85 75 65 55 45 35 25 15 05 05 15 25 35 45 55 65 75 85 95 05 15 25 35 45 55 65 75 85 95 05 15 25 35
.1 .1 .0 .0 . 0 . 0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 . 0 . 1 .1 .1 .1 .1 .1 .1 .1 .1 .1 .2 .2 .2 .2
- 0 -0 - 0 - 0 - 0 -0 - 0 - 0 - 0 - 0 -0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
N.B. BER v SNR plot under development, but initial work indicates
acceptable values can be obtained using range of SNR controlled by
range of spreading
39
Current Work-2:
Volatility Modelling and Exploratory Graphics
Topic comes from teaching financial time series in the Financial Mathematics
masters program
Financial time series ‘means’ volatility modelling
Volatility is changing conditional variance var( X t | X
 t 1 ) in a time series
Motivation – volatility models are routinely used without justification of the type of
volatility structure existing in the data series
But it has not been clear how to reveal volatility structure
Attitude has been ‘fit the model you think will be ok and undertake some
general tests of its fit’ - but never obtain the empirical volatility and compare
it with the model volatility
My attitude is ‘get an empirical version of the volatility function and choose
a model which gives a good volatility fit, i.e. get the volatility right first’ may be not the purest of likelihood approaches – but surely volatility is the
most important aspect of volatility models !
The General Volatility Model to be used
X t   ( X t 1 )    X t 1   t ,  t  IID (0,1)
40
FTSE100 Daily Data
4th Jan 2005 – 10th Feb 2011
Daily Adjusted Closing Values and Daily Returns
FTSE Values
8000
7000
6000
5000
4000
Returns
10%
5%
0%
-5%
-10%
01/01/2005
01/01/2006
01/01/2007
01/01/2008
Daily Date
01/01/2009
01/01/2010
01/01/2011
41
Journal of the Royal Statistical
Society, Series C, Applied Statistics
(2013) 62, Part 5, pp. 669-686
Volatility Graphics
Based on the general volatility model for returns
X t   ( X t 1 )    X t 1   t ,  t  IID (0,1)
  X t 1   volatility function
Graphics Steps

calculate  t  xt  ˆ t 1 )
(unscaled individual volatilities)
(
 nearly constant
with returns)

 smo(  | xt 1 )
(smoothed unscaled individual volatilities)
Smoothed & scaled i-volatilities ( xt 1 ) give empirical version of volatility function

 ( xt 1 )  smo(  | xt 1 ) 
scaling


gives standardized innovations
n

2

1
 1 (n  1)  ( xt  ˆ t 1 ) / smo(  | xt 1 ) 
t 2


12
42
Scaled Individual Volatilities and Their Smooth
12
10
Volatility
8
6
4
Empirical volatility
function
2
1
0
-7
-5
-3
-1
0
1
Previous Return
3
5
7
43
(see 20013 JRSS’C’ paper for more details)
Bootstrapping the Volatility Function
12
10
Volatility
8
6
4
2
1
0
-7
-5
-3
-1
0
1
Previous Return
3
5
7
That’s Enough, except…
44
The one nice thing about getting older
is that younger people follow you…
45
46
Many thanks
47
Download