Quadratic Minimisation Problems in Statistics

advertisement
Quadratic Minimisation Problems
in Statistics
Casper Albers, Frank Critchley & John Gower
Department of Statistics, The Open University
Outline
•
•
•
•
Introduction to problem (1)
Statistical examples of problem (1)
Geometrical insights: some easy, some hard
Concluding remarks
The essential problem
(1)
• A and B are square matrices (of the same order p)
• A is p.d. or p.s.d.
• B can be anything
• The constraint is consistent
Equivalent forms
Eq. (1) can occur in many other shapes and forms, e.g.:
• min (x – t)′A(x – t) subject to (x-s)′B(x-s) + 2g′(x-s) = k
• minx ||Xx – y||2 subject to x′Bx + 2b′x = k
• min trace (X – T)′A(X – T)
subject to trace (X′BX + 2G′X) = k
• We present a unified solution to all such problems.
General canonical form
• After simple affine transformations z = T-1 x + m and
s = T-1 t + m where T is such that,
 Γ1  
I


, T' BT  
T'AT  
 0
  Γ0  , (1) reduces to:
min || z  s ||
2
z
subject to :
z ' z  2g' z  k
Applications
Problem (1) arises, for example, in:
• Canonical analysis
• Normal linear models with quadratic constraints
• The fitting of cubic splines to a cloud of points
• Various forms of oblique Procrustes analysis
• Procrustes analysis with missing values
• Bayesian decision theory under quadratic loss
• Minimum distance estimation
• Hardy-Weinberg estimation
• Updating ALSCAL algorithm
• …
Application: Hardy-Weinberg
• Genotypes AA, BB, AB in proportions p = (p1, p2, p3)
• Observed proportions q = (q1, q2, q3)
• HW equilibrium constraint p32 = 4 p1 p2
• Additional constraints: 1′ p = 1, p ≥ 0
• GCF:
min z1  s1   z2  s2 
2
2
z
subject to z22 
6
3
Note linear term
z1  16
Indefinite constrained regression
• Ten Berge (1983) considers for the ALSCAL algorithm:
• The GCF has eigenvalues:
(1 + √2, ½, 1 - √2)
Ratios of quadratic forms (1)
• Canonical analysis: min x′Wx / x′Bx.
• When W or B is of full rank, we have:
min x′Wx s.t. x′Bx = 1, of form (1) with
Lagrangian Wx = λBx.
• BUT: the ratio form requires only a weak constraint while
if the Lagrangian is taken as fundamental, the constraint
becomes strong (see Healy & Goldstein, 1976, for x′1 = 1).
• In canonical analysis, multiple solutions are standard but
seem to have no place in our more general problem (1).
Ratios of quadratic forms (2)
When both A and B are of deficient rank:
• In the canonical case, the ANOVA T = W + B implies that
the null space of T is shared by B and W, and a simple
modification of the usual two-sided eigenvalue solution
suffices.
• However, for general matrices A, B things become much
more complicated.
Geometry helps understanding
The following slides illustrate the
problem geometrically showing
some of the complications that
have to be covered by the algebra
and algorithms.
PD and indefinite case
B is positive definite
B is indefinite
Lower dimensional target space
Lower dimensional target space
Indefinite constraints
Full dimensional
target space
Lower dimensional
target space
Parabola
Projections onto target space
B not canonical
B canonical
Fundamental Canonical Form
• (1) boils down to minz ||z – s||2 subject to z′ Γ z = k
• This gives Lagrangian form: ||z – s||2 – λ(z′ Γ z – k)
• With z = (I – λ Γ)-1 s, the constraint becomes
• In general, solutions found by solving this Lagrangian
• Feasible region (FR):
– When B is indefinite:
1/γ1 ≤ λ ≤ 1/γp
– When B is p.(s.)d.:
–∞ ≤ λ ≤ 1/γp
– f(λ) increases monotonically in the FR
• If s1 or sp are zero, adaptations are necessary
Lagrangian forms
B indefinite
B p.(s.)d.
Lagrangian forms: phantom asymptotes
root
s1 = 0
s2 = 0
Movement from the origin
Movement from the origin
Movement from the origin
Movement along the major axis
Movement along the major axis
Conclusions
• Equation (1) subsumes many statistical problems.
• A unified methodology eliminates examination of many
special cases.
• Geometry helps understanding; algebra helps detailed
analysis and provides essential underpinning for a
general purpose algorithm.
• By identifying potential pathological situations, the
algorithm can
• be made robust
• provide warnings.
Conclusions (informal)
• The unification is interesting and potentially useful.
• Its usefulness largely depends on the availability of a
general purpose algorithm. Coming soon.
• Algorithms depend on detailed algebraic underpinning
Done.
• Developing the algebra depends on understanding the
geometry. Done
Some references
• C.J. Albers, F. Critchley, J.C. Gower, Quadratic Minimisation
Problems in Statistics, 21st century
• M.W. Browne, On oblique Procrustes rotation, Psychometrika 32, 1967
• J.M.F. ten Berge, A generalization of Verhelst’s solution for a constrained
regression problem in ALSCAL and related MDS algorithms, Psychometrika 48, 1983
• F. Critchley, On the minimisation of a positive definite quadratic form under
quadratic constraints: analytical solution and statistical applications. Warwick
Statistics Research Report, 1990
• M.J.R. Healy and H. Goldstein, An approach to the scaling of categorical
attributes, Biometrika 63, 1976
• J. de Leeuw, Generalized eigenvalue problems with psd matrices,
Psychometrika 47, 1982
• J.J. Moré, Generalizations of the trust region problem, Optimization methods
and software, Vol. II, 1993
• J.C. Gower & G.B. Dijksterhuis, Procrustes Problems, Oxford University Press, 2004
Download