(John Hopkins) - Far Hills, May 2013

advertisement
Daily
Scratch
Rating
(DSR)
PLAYING CONDITIONS
Objective: The system should reflect
variations in playing conditions
(Principle 6).
Why is it important to assess variations
in playing conditions?
- It is axiomatic amongst the designers of all
handicap systems that a gross score must first
be standardised before it can be used for
handicapping.
- Standardisation enables us to meaningfully
assess the value of a score, and to meaningfully
compare it with all other scores. For example, is
78 a good score? In order to answer the
question we need to know the difficulty of the
course.
Why is it important to assess variations
in playing conditions? Contd.
- The objective of a course rating system is to
enable us to standardise scores.
- Course ratings are intended to precisely
measure the difficulty a course presents to a
golfer in the playing of their round.
Why is it important to assess variations
in playing conditions? Contd.
- If the rating of a course is not a true reflection
of the difficulty it presented to a golfer in the
playing of a round, the player’s standardised
score for that round will be inaccurate.
- If the standardised score is inaccurate, the
player’s handicap will be distorted (ie if inputs
are inaccurate, so must the output also be
inaccurate).
Why is it important to assess variations
in playing conditions? Contd.
- Does the difficulty of a course vary from day
to day?
- We know this can happen.
- Daily fluctuation can be caused by changed
hole placements, varying green speeds &
green firmness, and changed weather.
Why is it important to assess variations
in playing conditions? Contd.
- So how often will a rating be accurate if it is
static and unable to vary in-line with changes
in course difficulty from day to day?
- GA’s statisticians have conducted extensive
analysis of the comprehensive repository of
competition data in the GOLF Link database.
Why is it important to assess variations
in playing conditions? Contd.
- The statisticians have looked for competitions
where the weighted average net score varies
by 1 stroke or more from the average net
score that would be expected from the given
composition of players.
- The weighting is for field size – for small fields
the variance from the expected average net
score will need to be greater than 1 stroke.
[More detail on weighting for field size is
contained later in the presentation.]
Why is it important to assess variations
in playing conditions? Contd.
- Could it be that whole fields are just having a
good or bad day?
- Our statisticians have factored into their
analysis the standard deviation of player
scores.
- They have assessed that the likelihood of
large fields simply playing well, or playing
poorly, is very low (and the weighting for field
size accommodates the potential for this to
happen with small fields).
Why is it important to assess variations
in playing conditions? Contd.
- Our statisticians have assessed that actual
course difficulty aligns with the static Scratch
Rating on only 33% of occasions.
- Are we suggesting that a daily rating system
can achieve 100% accuracy?
- No, but 90-95% is a substantial improvement
on 33%.
Why is it important to assess variations
in playing conditions? Contd.
- But if a rating is sometimes a little high and
sometimes a little low, does it all not just
‘even out’ under a static Scratch Rating
System?
- IMPACT ON NATIONAL TRENDS &
AVERAGES. DSR will make handicaps in
general less volatile.
-
Why? What is currently happening is that good scores on easy days are resulting in larger downward
movements than will happen under DSR (because the course rating stays artificially high under the
static Scratch Rating system). This causes a player’s handicap to dip temporarily, it becomes lower
than it should be and the player struggles to play to the new value, before drifting out again. This is
accentuated by scores on hard days being evaluated against ratings that are artificially low. A further
impact of decreased volatility is that across Australia, the average handicap will increase very slightly.
Why is it important to assess variations
in playing conditions? Contd.
- INDIVIDUAL IMPACT. Whilst the impact on
national averages is important, in this case
it only paints a small part of the picture.
Examples of the impact that not having a
DSR has on individual golfers are as
follows:
•
Bill always plays in the morning when it’s calm, and Tony always plays in the afternoon when it’s
windy. If there’s no DSR, both players will be having their scores evaluated against incorrect
course ratings (one will be too high and the other too low). The impact of the two cases net off
against each other to give an average that looks perfect. However, this doesn’t help Bill whose
handicap will always be too low, and Tony won’t be happy either because he’s trying to improve
his handicap and yet it’s always kept artificially high!
Why is it important to assess variations
in playing conditions? Contd.
•
Jenny’s underlying ability doesn’t change from summer to winter but because of the heavier
winter conditions, she doesn’t score as well. Without DSR, Jenny’s handicap increases in winter
and then decreases again in summer. Jenny doesn’t play much at the start of summer, so when
she does start to play regularly at the height of summer her handicap is artificially high compared
to those players whose handicaps have already adjusted to the easier conditions.
•
John’s handicap is only based on an average of 8 scores. As a result, it doesn’t take too many
inaccurate ratings to distort his handicap at any given point in time. Distortion can be caused
either by ‘top 8’ scores that were actually returned against artificially high ratings, or by scores in
the worst 12 that would have been in the top 8 if they had been assessed against ratings more
reflective of the true course difficulty than the Scratch Rating. Whilst the ups and downs may
even out over the course of a substantial number of rounds, it seems hopeful to expect they will
even out at any given point in time.
•
Fiona has been playing in the Melbourne winter. Without DSR, her handicap will increase
because of the harder conditions. She visits her friend in tropical Cairns and plays golf. Coming
out of the Melbourne winter with her artificially high handicap, Fiona wins the Cairns competition.
Why is it important to assess variations
in playing conditions? Contd.
- But if a golfer plays on a very difficult day,
won’t a poor score fall into the worst 12 of
their most recent 20 scores? And if so, isn’t
the need to adjust the course rating
negated? If this is correct, why do we need
DSR?
•
Firstly, handicap golfers are just as capable of playing well on difficult days as they are on easy
days. And whilst the score may be worse on a hard day, it is not necessarily because the quality
of the round is worse. In a proportionate number of cases it is instead because the difficulty of
the course was increased. As a result, a round should not be dismissed just because the
conditions were difficult and the score itself is commensurately worse.
Why is it important to assess variations
in playing conditions? Contd.
•
Secondly, a handicap system should be flexible enough to accurately assess the standard of
rounds played by players under varying conditions.
•
Thirdly, a score may be assessed as poor when compared against the static Scratch Rating, but it
may be in a player’s top 8 if it is compared against the true rating.
•
Fourthly, if a static Scratch Rating is being used, a score from a difficult day that is in a player’s
top 8 will be assessed as being worse than it should be as it is being compared against an
artificially low rating. This will make the handicap higher than it should be.
•
Fifthly, if a static Scratch Rating is being used, a score from an easier day that is in a player’s top
8 will be assessed as being better than it should be as it is being compared against an artificially
high rating. This will make the handicap lower than it should be.
•
Sixthly, it is possible that the swings and roundabouts of the above may cancel each other out.
However, a sample of 8 is not large and whilst the ups and downs may even out over the course
of a substantial number of rounds, it seems hopeful to expect they will even out at any given point
in time.
Why is it important to assess variations
in playing conditions? Contd.
• But how much does course difficulty really vary
from day to day and with the seasons? Below
is a graph of DSRs that have been calculated
from scores at a typical Australian golf club.
Page 15
Why is it important to assess variations
in playing conditions? Contd.
• The graph shows clear fluctuations in course
difficulty from day to day.
• The graph also shows clear seasonal
fluctuations in course difficulty.
• But how much of an impact does all of this
have on the calculation of handicaps?
• One measure is to compare the proportion of
Anchored players before and after the
introduction of DSR by using the data from our
typical Australian club.
Page 16
Why is it important to assess variations
in playing conditions? Contd.
Page 17
Why is it important to assess variations
in playing conditions? Contd.
Page 18
Why is it important to assess variations
in playing conditions? Contd.
- So we can see that providing ratings that align
with the actual difficulty presented to the golfer
in the playing of their round does make a
material difference to handicaps.
- (Note: Approx 70% of the reduced Anchorage
impact is caused by DSR (the remainder is
caused by a mix of Slope and the Stableford
Handicapping Adjustment).)
Page 19
Exactly what is being taken into
consideration when making the DSR
calculation?
- DSR inputs are:
• Gender
• Type of competition (Stroke, S’ford, Par)
• Field size
• Average handicap of field
• Average net score of field
- The suite of DSR algorithms is provided at
Appendix A.
Exactly what is being taken into
consideration when making the DSR
calculation? Contd.
The use of the Mean Net Score for a given
Competition
• The Mean Net Score expected from a
given competition (under normal
conditions) is able to be well determined by
establishing the average handicap of the
field, and type of competition (Par,
Stableford, or Stroke), and whether the
competition was for Men or Women.
Exactly what is being taken into
consideration when making the DSR
calculation? Contd.
• This expected Mean Net Score will be less
than the net score players would need to
achieve in order to play to their handicaps.
Let’s call this difference the Normal
Deduction.
• It is related to handicaps as follows: [see
next slide]
Exactly what is being taken into
consideration when making the DSR
calculation? Contd.
Exactly what is being taken into
consideration when making the DSR
calculation? Contd.
• The difference between the Actual Mean
Net Score and the Expected Mean Net
Score is a first approximation to an
adjustment that should be made for
handicap purposes to account for Course
Conditions.
• It will be positive on an easy day, and
negative on a difficult day.
The Weighting Factor and how the
system can work even for very small
fields.
• However, it is necessary to give the value
above a weight, which accounts for field
size, and which is also dependent on the
average handicap of the players in the
field. This weighting factor is close to 100%
for large field sizes, but as low as 25% for
a field size of one.
• The following table gives some examples:
[see next slide]
The Weighting Factor and how the
system can work even for very small
fields. Contd.
• Only Modest Field Sizes are needed to get a good
result.
• A DSR can be calculated from a dataset of 1 player.
Weight
Average
Handicap of
the field
Men’s
Stableford
Women’s
Stroke
15
25
Field Size
20%
2
4
50%
10
16
80%
40
60
90%
90
140
Page 26
The Weighting Factor and how the
system can work even for very small
fields. Contd.
• So we ascribe a greater weight to the
average net score of a large field than
we do to the average net score of a
small field.
• But how have we related field size to
confidence? Our statisticians have
used Bayes’ Theorem.
Page 27
The Weighting Factor and how the
system can work even for very small
fields. Contd.
• From Bayes’ Theorem is derived the
formula for weighting the Daily Estimate
compared to the measured Scratch
Rating.
• Bayes proved that it is optimal. There
can be no better estimate of the Daily
Condition than the Bayes estimate.
• More information on Bayes’ Theorem is
provided at Appendix B.
Page 28
When is the cut-off for scores to be
submitted in order for the DSR to be
calculated for a given day?
• There is no hard and fast rule.
• GA’s direction to its clubs is that scores for
a given day are to be processed through
GOLF Link as soon as is practicable
(irrespective of whether a daily rating
system is in operation).
• For some clubs this will be on the day of
play (preferable).
• For other clubs this will be within a week of
the day of play (acceptable).
How will the system work in practice in
terms of updating players’ handicaps on
GOLF Link?
- GOLF Link will calculate all Differentials
against the DSR, NOT the Scratch Rating.
- DSR will not provide any administrative
impost on clubs.
- The club will enter the scores of players into
GOLF Link (in the same way it would if DSR
was not in operation), press the button, and
GOLF Link will perform all the necessary
calculations.
What is the implementation protocol for
competitive play and for extra day
scores?
- DSRs will be calculated for competition
scores AND extra day scores.
- All extra day scores returned at a course on
a given day will be processed through
GOLF Link as a single Batch (in the same
way that competition scores will be
processed).
- If an extra day score is not processed in the
appropriate Batch, it will still be eligible to be
used for handicapping.
Comparison with other methods used
worldwide.
- EGA and CONGU both operate a daily
rating component.
- EGA and CONGU both treat as indicative
the proportion of players in each handicap
grade to return a good score.
- DSR uses all scores (ie good AND poor).
- The analysis performed by GA’s statisticians
led them to the conclusion that there is
material value in using good scores AND
poor scores.
Comparison with other methods used
worldwide. Contd.
- The CONGU component operates by
adjusting the scratch rating against which a
player’s score is compared (as will DSR).
- The EGA component operates ostensibly by
expanding or contracting the buffer zones
(however it effectively operates by adding or
subtracting a value from a player’s net
score so as to account for a variation in
conditions).
Comparison with other methods used
worldwide. Contd.
- CONGU and EGA prefer a greater degree
of statistical certainty than GA in order for
the daily rating to vary from the Scratch
Rating.
- After consideration, GA took the view that a
more dynamic approach is more likely to
yield outputs that will align with the golfer’s
view of the difficulty of the course on the
day.
How easy is DSR to understand?
- The concepts underpinning the EGA &
CONGU daily rating components are all
readily understood by golfers.
- GA firmly expects that golfers will readily
understand the concepts underpinning
DSR.
How easy is DSR to understand? Contd.
- The mathematics underpinning these
methods are more esoteric.
- GA believes there are two key determinants
of whether a daily rating system is accepted
by golfers. Firstly, the degree to which it is
conceptually understood. Secondly, its
ability to produce intuitive outputs.
- Golfers want ratings that align with their
view of the difficulty of the course on the
day.
Is there a simpler daily rating solution?
• CCR
• DSR
Is there a simpler daily rating solution?
Contd.
- In Australia we previously operated a simple
mathematical model (CCR) where the
12½% net score became the daily rating.
- CCR was necessarily simplistic (as it
operated prior to the age of universal
computer use) and it was statistically
inefficient.
Is there a simpler daily rating solution?
Contd.
- The mathematics of CCR were readily
understood by golfers and administrators.
- However the commonly-held view of golfers
(and administrators) was that CCRs were
often more a reflection of the quality of the
field than they were of the difficulty of the
course (eg veterans’ fields would produce
higher CCRs than regular handicap fields).
Is there a simpler daily rating solution?
Contd.
- CCR did not work so well for clubs in
regional areas or for women’s fields, ie in
cases where fields were frequently small.
- A further problem with women’s fields was
that they typically exhibited materially higher
average handicaps than do men’s fields.
- Our statisticians are supremely confident
that DSR will deliver materially better
outcomes for our golfers than did CCR.
Results of pilot study.
- DSR was developed by the Daily Rating
Statistical Review Group – comprised of
experienced administrators and highly
accomplished statisticians.
- It was laboratory tested against millions of
rounds.
- It has also been trialled in a live
environment in a diverse selection of
Australian clubs.
Results of pilot study. Contd.
- There have been three phases to the live
trial.
- The first two phases were held across
different seasons.
- We are currently engaged in the 3rd and
final phase.
- The live trial has involved a diverse
selection of 20 clubs (regional and
metropolitan, large and small).
Results of pilot study. Contd.
- DSR values have been calculated each day
and emailed to officials at each trial club.
- These officials were asked to comment on
the degree to which the calculated DSRs
have aligned with their intuitive view of the
difficulty of the course on the day.
- The feedback from the live trial clubs has
been very positive and extremely
encouraging.
APPENDIX A – The DSR Algorithms
Normal Deduction: ND = mH+b
Where H is the average handicap of the field, m and b are taken from the table below, representing the
slope and intercept of the straight line of best fit. R is the correlation of this fit.
Par P
Stblfd S
Stroke K
m
(0.052)
(0.111)
(0.124)
Men
b
(2.777)
(3.498)
(4.372)
R
0.943
0.972
0.973
m
(0.062)
(0.117)
(0.146)
Women
b
(2.514)
(3.338)
(3.939)
R
0.959
0.953
0.984
Weighting Factor = n/((m'H+b')2/CSD2+n)
Where n is the field size, H is the average handicap of the field, CSD (the Course Standard Deviation) is
estimated at 1.5, and m’ and b’ are taken from the table below representing the slope and intercept of the
straight line of best fit for the empirically derived standard deviation of the Normal Deduction.
Par P
Stblfd S
Stroke K
m'
0.023
0.057
0.083
Men
b'
3.194
3.841
3.993
R
0.856
0.829
0.956
m'
0.027
0.057
0.081
Women
b'
3.040
3.775
3.889
R
0.760
0.917
0.938
CPA (Course Parameter Adjustment) = Prior CPA + WCA x 0.02 x (0.7 for Men, 0.5 for Women)
Where WCA (Weighted Condition Adjustment) is SR (Scratch Rating) minus DSR.
Putting DSR into a single formula:
DSR = SR - (S- (36+Par-SR+CPA-mH-b))n/((m'H+b')2/CSD 2+n)
Where S is the actual average Stableford points scored in the competition, and the other symbols have
their meaning as above.
APPENDIX B – Bayes’ Theorem
BAYES’ THEOREM
Bayes’ Theorem starts with an estimate of some value you’d like to ascertain; then it assumes that you
can take a sample which gives a further but limited estimate of the value; from these pieces of data the
formula gives a final estimate which combines the initial estimate with the sample result. It effectively
gives the amount of weight that can be given to the sample compared to the initial estimate, this in turn
being based on the sample size and the inherent variability of the estimate and the sample.
EXPLANATORY NOTES ON BAYES’ THEOREM
Suppose there is a true answer to a question.
Let’s take two examples to demonstrate:
1. What is the proportion of voters that will vote for the Republicans in the next election?
2. What is the true scratch rating of a golf course on which I will play tomorrow? Due to conditions,
the course may play differently to the static scratch rating which may be considered as an average
or normal rating.
In the case of the voters, every four years the question is answered. But along the way, there are many
polls of voters’ intentions. Eventually we will know the truth.
In the case of the course, we will never know the true answer. We could only know if all the golfers in the
land actually played on it tomorrow, clearly impossible.
APPENDIX B – Bayes’ Theorem
EXPLANATORY NOTES ON BAYES’ THEOREM (contd.)
When we take a poll of voters, the smaller the sample, the less accurate the poll result. The greater the
sample, the greater the accuracy, until on voting day, we get complete accuracy.
When we take a sample of golfers’ scores tomorrow, we will get a “poll”, and by comparing their average
score to what would be expected for a field of that composition, we will get an estimate of whether the
course played easier, harder, or no different to normal. The more golfers in the sample, the better the poll
result. If the average net score is higher than expected, then this is a poll result suggesting that the
course played harder.
Bayes however, was able to build in the concept of having an idea of what might be expected as an
answer before you take the sample. In the case of voters, it might be voting patterns at the last election.
And you can measure how variable this has been over the years. In the case of a daily scratch rating it
might be the static scratch rating ascribed to the course.
He then said we will only depart from the first idea if there is enough evidence to prove that there is
indeed a better answer. So, in the case of voters in an area which in the past has voted 55% Republican,
finding that only 30% of a sample of 100 voters intends to vote Republican at the next election, may not
be convincing. The same result with a sample of 10,000 voters may be very significant. Bayes’ theorem
allows the optimal weight to be ascribed to the poll result, and to determine if this result is significant.
APPENDIX B – Bayes’ Theorem
EXPLANATORY NOTES ON BAYES’ THEOREM (contd.)
In the case of the daily scratch rating, application of the theorem allows us to come up with an estimate
of the daily rating, but only lets it deviate from the static rating if, firstly, the average scores indicate a
deviation and, secondly, the number of players making up the sample is sufficient to make a different
result the most likely on the balance of probabilities. Of course the poll of golfer scores may indicate that
there is no case for deviation at all, either because the sample is too small, or because the average of
their scores was quite close to what would be expected. But if there is a deviation, after the Bayes
weighting is applied, we can be certain that this estimate of the daily scratch rating is the optimal value
available based on the evidence. It is more likely to be a reflection of the true daily rating on that day
than the static rating.
One might say that Bayes’ theorem allows us to anchor the new answer (the daily scratch rating) firmly to
the original idea (the static scratch rating), and, conservatively, only allows deviation when the evidence
is overwhelming.
QUESTIONS
Download