Computational Information Geometry: Model Sensitivity and Approximate Cuts Karim Anaya-Izquierdo (The Open University) joint work with Frank Critchley (Open University), Paul Marriott (Waterloo) and Paul Vos (East Carolina) WOGAS 2, April 2010 CIG: Model Sensitivity and Approximate Cuts Introduction Overall objective: provide tools to help understand sensitivity to model choice CIG: Model Sensitivity and Approximate Cuts Introduction Overall objective: provide tools to help understand sensitivity to model choice Target: applications of Generalised Linear Models CIG: Model Sensitivity and Approximate Cuts Introduction Overall objective: provide tools to help understand sensitivity to model choice Target: applications of Generalised Linear Models Delivered via ... Computational Information Geometry CIG: Model Sensitivity and Approximate Cuts Introduction Overall objective: provide tools to help understand sensitivity to model choice Target: applications of Generalised Linear Models Delivered via ... Computational Information Geometry CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients θ quantity/parameter of interest CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients θ quantity/parameter of interest x data CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients θ quantity/parameter of interest x data f (x; θ ) base/working model CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients θ quantity/parameter of interest x data f (x; θ ) base/working model ⇒ CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients θ quantity/parameter of interest x data f (x; θ ) base/working model ⇒ inference about θ CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients θ quantity/parameter of interest x data f (x; θ ) base/working model ⇒ inference about θ What is the effect on inference about θ when considering models apart from f (x; θ )? CIG: Model Sensitivity and Approximate Cuts Model sensitivity Key ingredients θ quantity/parameter of interest x data f (x; θ ) base/working model ⇒ inference about θ What is the effect on inference about θ when considering models apart from f (x; θ )? CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 θ does not lose its meaning CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 Examples of ‘directions’ η CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 Examples of ‘directions’ η introduce extra variability (mixture model) CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 Examples of ‘directions’ η introduce extra variability (mixture model) introduce skewness CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 Examples of ‘directions’ η introduce extra variability (mixture model) introduce skewness introduce heavier tails CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 Examples of ‘directions’ η introduce introduce introduce introduce extra variability (mixture model) skewness heavier tails dependence CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 Examples of ‘directions’ η introduce introduce introduce introduce extra variability (mixture model) skewness heavier tails dependence What is the effect on inference about θ when perturbing along the direction η? CIG: Model Sensitivity and Approximate Cuts Model sensitivity Perturb base model f (x; θ ) ‘Pick a direction’ η Perturb f (x; θ ) → g (x; θ, η ) so that 1 2 θ does not lose its meaning g (x; θ, η0 ) = f (x; θ ) for some η0 Examples of ‘directions’ η introduce introduce introduce introduce extra variability (mixture model) skewness heavier tails dependence What is the effect on inference about θ when perturbing along the direction η? CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, (θ, η ) ∈ Θ × Λ CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, (θ, η ) ∈ Θ × Λ L(θ, η; x ) = CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, (θ, η ) ∈ Θ × Λ L(θ, η; x ) = Li (θ; T (x )) × CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, (θ, η ) ∈ Θ × Λ L(θ, η; x ) = Li (θ; T (x )) × Ln (η; S (x )) ∀ θ, η, x CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, (θ, η ) ∈ Θ × Λ L(θ, η; x ) = Li (θ; T (x )) × Ln (η; S (x )) ∀ θ, η, x then T is a cut CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, (θ, η ) ∈ Θ × Λ L(θ, η; x ) = Li (θ; T (x )) × Ln (η; S (x )) ∀ θ, η, x then T is a cut Very tight definition valid in very special cases: CIG: Model Sensitivity and Approximate Cuts Cuts Model g (x; θ, η ) θ is of interest η is nuisance A cut is a statistic which allow inference about θ independently of the values of η Key idea is factorisation of the likelihood if (T (x ), S (x )) ⇔ x, (θ, η ) ∈ Θ × Λ L(θ, η; x ) = Li (θ; T (x )) × Ln (η; S (x )) ∀ θ, η, x then T is a cut Very tight definition valid in very special cases: Exponential Families with so-called mixed parametrisation CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ In practice relax conditions to define an approximate cut: CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ In practice relax conditions to define an approximate cut: Inferences about θ based on T (x ) ‘do not change much’ when perturbing in the direction η CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ In practice relax conditions to define an approximate cut: Inferences about θ based on T (x ) ‘do not change much’ when perturbing in the direction η data x is fixed CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ In practice relax conditions to define an approximate cut: Inferences about θ based on T (x ) ‘do not change much’ when perturbing in the direction η data x is fixed For (θ, η ) values where data x supports the use of g (x; θ, η ) CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ In practice relax conditions to define an approximate cut: Inferences about θ based on T (x ) ‘do not change much’ when perturbing in the direction η data x is fixed For (θ, η ) values where data x supports the use of g (x; θ, η ) By complement we can find directions η with large effect on inference: CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ In practice relax conditions to define an approximate cut: Inferences about θ based on T (x ) ‘do not change much’ when perturbing in the direction η data x is fixed For (θ, η ) values where data x supports the use of g (x; θ, η ) By complement we can find directions η with large effect on inference: Confidence interval length substantially increased CIG: Model Sensitivity and Approximate Cuts Cuts for model sensitivity Inferences on θ are made using T (x ), for example an interval estimate for θ the rejection region of a hypothesis test Ideal case: there exists an enlarged model g (x; θ, η ) with cut (T (x ), S (x )) ⇒ perturbation η0 → η does not affect inferences about θ In practice relax conditions to define an approximate cut: Inferences about θ based on T (x ) ‘do not change much’ when perturbing in the direction η data x is fixed For (θ, η ) values where data x supports the use of g (x; θ, η ) By complement we can find directions η with large effect on inference: Confidence interval length substantially increased Do not reject instead of rejecting CIG: Model Sensitivity and Approximate Cuts Our approach Use Structured Extended Multinomial models (SEM’s) Extended multinomials are multinomials but allow cell probabilities to be zero Discretising continuous data gives multinomials with structure on the cells SEM proxy for universal space of all distributions CIG: Model Sensitivity and Approximate Cuts Example Question: what is the population mean θ? 0.6 Histogram of toy.data 0.5 Data 0.4 Model N (θ, 1) 0.3 Model N (θ, σ2 ) 0.2 Model N (θ, 1) remove outlier (Base) 0.1 Model Logistic(µ(θ ), σ ) 0.0 Model log-Normal(µ0 (θ ), σ2 ) −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Example Question: what is the population mean θ? 0.6 Histogram of toy.data 0.5 Data 0.4 Model N (θ, 1) 0.3 Model N (θ, σ2 ) 0.2 Model N (θ, 1) remove outlier (Base) 0.1 Model Logistic(µ(θ ), σ ) 0.0 Model log-Normal(µ0 (θ ), σ2 ) −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Example Question: what is the population mean θ? 0.6 Histogram of toy.data 0.5 Data 0.4 Model N (θ, 1) 0.3 Model N (θ, σ2 ) 0.2 Model N (θ, 1) remove outlier (Base) 0.1 Model Logistic(µ(θ ), σ ) 0.0 Model log-Normal(µ0 (θ ), σ2 ) −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Example Question: what is the population mean θ? 0.6 Histogram of toy.data 0.5 Data 0.4 Model N (θ, 1) 0.3 Model N (θ, σ2 ) 0.2 Model N (θ, 1) remove outlier (Base) 0.1 Model Logistic(µ(θ ), σ ) Model log-Normal(µ0 (θ ), σ2 ) 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Example Question: what is the population mean θ? 0.6 Histogram of toy.data 0.5 Data 0.4 Model N (θ, 1) 0.3 Model N (θ, σ2 ) 0.2 Model N (θ, 1) remove outlier (Base) 0.1 Model Logistic(µ(θ ), σ ) Model log-Normal(µ0 (θ ), σ2 ) 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Example Question: what is the population mean θ? 0.6 Histogram of toy.data 0.5 Data 0.4 Model N (θ, 1) 0.3 Model N (θ, σ2 ) 0.2 Model N (θ, 1) remove outlier (Base) 0.1 Model Logistic(µ(θ ), σ ) Model log-Normal(µ0 (θ ), σ2 ) 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT 0.4 1 0 2 3 4 2 6 −3.0 −4 −2.5 −2 5 −1.0 7 1.6 8 1. −1.5 8 log−likelihood 6 2.2 2 0. −2.0 4 1.4 0.8 −0.5 1.2 1 η Log−likelihood 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT Log−likelihood 0.0 Natural Parameters −3.0 −4 −2.5 −2 5 −1.0 2 2 7 log−likelihood 0 η2 0.4 8 6 4 3 1 2.2 2 8 1. 6 −1.5 1.6 0.8 0. −2.0 4 1.4 −0.5 1.2 1 −2 −1 0 1 2 1.0 1.2 1.4 η1 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT Log−likelihood 0.0 Natural Parameters −3.0 −4 −2.5 −2 5 −1.0 2 2 7 log−likelihood 0 η2 0.4 8 6 4 3 1 2.2 2 8 1. 6 −1.5 1.6 0.8 0. −2.0 4 1.4 −0.5 1.2 1 −2 −1 0 1 2 1.0 1.2 1.4 η1 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT Log−likelihood 0.0 Natural Parameters −3.0 −4 −2.5 −2 5 −1.0 2 2 7 log−likelihood 0 η2 0.4 8 6 4 3 1 2.2 2 8 1. 6 −1.5 1.6 0.8 0. −2.0 4 1.4 −0.5 1.2 1 −2 −1 0 1 2 1.0 1.2 1.4 η1 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT Log−likelihood 0.0 Natural Parameters −3.0 −4 −2.5 −2 5 −1.0 2 2 7 log−likelihood 0 η2 0.4 8 6 4 3 1 2.2 2 8 1. 6 −1.5 1.6 0.8 0. −2.0 4 1.4 −0.5 1.2 1 −2 −1 0 1 2 1.0 1.2 1.4 η1 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT Log−likelihood 0.0 Natural Parameters −3.0 −4 −2.5 −2 5 −1.0 2 2 7 log−likelihood 0 η2 0.4 8 6 4 3 1 2.2 2 8 1. 6 −1.5 1.6 0.8 0. −2.0 4 1.4 −0.5 1.2 1 −2 −1 0 1 2 1.0 1.2 1.4 η1 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Insensitive direction: CUT Log−likelihood 0.0 Natural Parameters −3.0 −4 −2.5 −2 5 −1.0 2 2 7 log−likelihood 0 η2 0.4 8 6 4 3 1 2.2 2 8 1. 6 −1.5 1.6 0.8 0. −2.0 4 1.4 −0.5 1.2 1 −2 −1 0 1 2 1.0 1.2 1.4 η1 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction Log−likelihood 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 2 4 5 7 −2 −3.0 −2.5 −1 8 −1.0 1.4 −1.5 1.2 log−likelihood 1 0 .8 −2.0 0.4 .6 1 0.2 0 −0.5 2 0.0 Natural Parameters −2 −1 0 1 2 1.0 1.2 1.4 λ 1.6 1.8 2.0 θ Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 ● 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 ● 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 ● 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 ● 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 ● 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 ● 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 1 0 η ● 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 2 3.4 3.2 2.8 2.4 3 1 0 η ●3 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 ● 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 1.8 ● 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 3.6 ● 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 ●3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Highly sensitive direction: Interpretation PMF 1.2 1.4 1.6 ●3.6 1.8 6 3.8 2.2 2.6 3 2 3.4 3.2 2.8 2.4 3 0 η 1 4 5 7 −2 0.00 −1 8 0.04 2 0.08 1 0 .8 Probability 0.4 .6 1 0.2 0 0.12 2 Natural Parameters −2 −1 0 1 2 −2 0 λ 2 4 6 x Blue = mean; Black = Likelihood; Red = Base Model data \ outlier CIG: Model Sensitivity and Approximate Cuts Sensitive direction: Mixed Parameters (Duality) 2 Mixed Parameters 2 Natural Parameters 1.2 1.4 2 1 4 1.2 1.4 1.6 3.6 1.8 6 3.8 2.2 2.6 3 2 1.8 3.4 2.2 2.4 2.6 2.8 3 3.4 3.2 2.8 2.4 8 3 5 3 4 7 1 3.6 0 0 η 1 3.4 3.2 3 3.8 4 2 2 5 6 7 −2 −1 8 −1 4 −2 η 1.6 1 1 0 .8 3 3.2 0.4 .6 1 0.2 0 −2 −1 0 λ 1 2 0.5 1.0 1.5 2.0 2.5 3.0 θ CIG: Model Sensitivity and Approximate Cuts Insensitive direction: Mixed Parameters (Duality) CUT Natural Parameters 1.2 1.6 1.8 1.6 432 1.8 8 7 4 0 2 3 2 6 6 7 −4 −4 −2 8 −2 5 1 η 0.4 1 5 2.2 4321 3 4 0 2 2 0.6 1.2 2.2 2 2.4 1.4 0.8 4 4 1 η Mixed Parameters −2 −1 0 λ 1 2 0.5 1.0 1.5 2.0 2.5 3.0 θ CIG: Model Sensitivity and Approximate Cuts Perturbation Space What is mean? 0.1 0.2 0.3 0.4 Data \ outlier 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Perturbation Space What is mean? 0.1 0.2 0.3 0.4 Data \ outlier 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Perturbation Space What is mean? 0.1 0.2 0.3 0.4 Data \ outlier and base model N (µ, 1): 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Perturbation Space What is mean? 0.1 0.2 0.3 0.4 Data \ outlier and base model N (µ, 1): only one direction is highly sensitive 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Perturbation Space What is mean? 0.3 0.4 Data \ outlier and base model N (µ, 1): only one direction is highly sensitive 0.1 0.2 This corresponds to changing variance giving model N (µ, σ2 ) 0.0 Outlier −2 0 2 4 6 8 Data CIG: Model Sensitivity and Approximate Cuts Perturbation Space What is mean? 0.3 0.4 Data \ outlier and base model N (µ, 1): only one direction is highly sensitive 0.1 0.2 This corresponds to changing variance giving model N (µ, σ2 ) 0.0 Outlier −2 0 2 4 6 8 If include outlier one more parameter needed Data CIG: Model Sensitivity and Approximate Cuts Other types of perturbations We can perturb by: CIG: Model Sensitivity and Approximate Cuts Other types of perturbations We can perturb by: Translations · · · Least informative directions CIG: Model Sensitivity and Approximate Cuts Other types of perturbations We can perturb by: Translations · · · Least informative directions Rotations another story · · · bias, variance trade-off · · · CIG: Model Sensitivity and Approximate Cuts Other types of perturbations We can perturb by: Translations · · · Least informative directions Rotations another story · · · bias, variance trade-off · · · Adding nuisance parameters · · · CIG: Model Sensitivity and Approximate Cuts Other types of perturbations We can perturb by: Translations · · · Least informative directions Rotations another story · · · bias, variance trade-off · · · Adding nuisance parameters · · · Down-weight/delete outliers · · · CIG: Model Sensitivity and Approximate Cuts Other types of perturbations We can perturb by: Translations · · · Least informative directions Rotations another story · · · bias, variance trade-off · · · Adding nuisance parameters · · · Down-weight/delete outliers · · · CIG: Model Sensitivity and Approximate Cuts Computation Discretise and work in SEM CIG: Model Sensitivity and Approximate Cuts Computation Discretise and work in SEM Extension g is an exponential family CIG: Model Sensitivity and Approximate Cuts Computation Discretise and work in SEM Extension g is an exponential family Construct mixed parametrisation CIG: Model Sensitivity and Approximate Cuts Computation Discretise and work in SEM Extension g is an exponential family Construct mixed parametrisation Fisher Information (variance) is block diagonal CIG: Model Sensitivity and Approximate Cuts Computation Discretise and work in SEM Extension g is an exponential family Construct mixed parametrisation Fisher Information (variance) is block diagonal Choose directions where θ block changes most with η CIG: Model Sensitivity and Approximate Cuts Summary Overall objective: provide tools to help understand sensitivity to model choice Target: applications of Generalised Linear Models Delivered via ... Computational Information Geometry (hidden from the user) Work supported by EPSRC grant EP/E017878/1 CIG: Model Sensitivity and Approximate Cuts