E.G. Tennis Match Prediction

advertisement
E.G. Tennis Match Prediction
Problem
• Wish to predict outcome of championship
tennis matches.
Q1 What info do we have
• Full history of previous matches on different
courts at different times by different players.
Court info. Player info. Weather etc.
• New players: less info.
• Surrogate information: Assoc. Tennis Prof.
Rankings (ordinal), ATP Points (numeric).
• Some player pairs will never have player before.
Q2 What factors might affect things
• Known
– ATP Rank, Score, Age, Court type,
– Particular pairing issues (A seems to always
lose to B).
– More recent history.
•
Latent
– Recent injuries
– Latent rank
– Current form
Q3 How to model
•
•
•
•
•
Particular pairings matter
But too many pairings (O(n^2))
Rank (O(n) ) easier to work with.
Use only known info for now.
One approach
– Use rank and other O(n) to predict probability of outcome of any
given pairing using discriminative approach. E.g. neural network
– Remember symmetry in design! Remember things change with
time – check results for different eras.
– Then use a prediction for a particular pair as a prior for a
Benoulli-Beta distribution for individual pairing.
– Use historic data for this particular pairing to refine posterior
Beta distribution.
Q3 How to model
•
•
•
•
•
Particular pairings matter
But too many pairings (O(n^2))
Rank (O(n) ) easier to work with.
Use only known info for now.
One approach
– Use rank and other O(n) to predict probability of outcome of any given
pairing using discriminative approach. E.g. neural network
– Remember symmetry in design! Remember things change with time –
check results for different eras.
– Representation matters. Represent things in the most informative way
you can, without compromising too much on flexibility.
– E.g. rank. Difference in rank (maybe – simple)? Both players ranks (yes
probably). Represent numerically (probably not just this). Represent
using temperature encoding?
– Maybe. But do we want separate labels for rank 200 and rank 201?
Probably not.
– Maybe use numeric ranks AND temperature encoding on log scale).
– Maybe work on refining rank representation before doing anything else.
Q4 What problems
• No good as different matches at different times
• Players change.
• Need MLP output for different times. But this gives
different Beta distributions.
• So instead use output of neural network before going
through the sigmoid.
• Try to estimate bias for individual pairing.
• Little data for individual pairings: need to be Bayesian.
• Put Gaussian prior distribution on bias. Use approximate
Bayesian methods to update bias distribution (next
lectures).
Further still
• Players have styles: use who beats who to
provide player groupings.
• See e.g. collaborative filtering.
• Experts may have access to info that is
hard to encode. Incorporate expert
predictions into data.
• At all stages: check it answers the
questions you want it to.
Q5 What next
• Get data. Check data. Check outliers. Check
consistency:
• Do things change over time. Courts change. Rules
change. Etc.
• Get data into the right format. How to represent ordinal
data.
• Build network. Add constraints, train, validate. Check
assumptions. Is this actually going to work well enough –
should see at this stage if it is definitely not.
• Refine predictions using individual pairings.
• Revalidate. Recheck assumptions.
• Anything not quite right? Look carefully. Explain all
observations. Know what is going on.
Q6 How to deploy
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Test in the field.
Test in the field.
Refine.
Test in the field.
Test in the field.
Refine
Test in the field.
Test in the field.
Refine.
Test in the field.
Test in the field.
Refine.
Test in the field.
Test in the field.
Freeze
Test in the field x10.
Deploy (or ditch at earlier stage).
Download