This is an example of a bad talk (Disclaimer: The paper that should have been presented in this talk is a classic in the field, a great paper: this talk, not the paper, is rotten). On the Foundations of Relaxation Labeling Processes By An Anonymous Student Overview • • • • • • • • • • • • • • Motivation I. Introduction to Labeling Problems II. Continuous Relaxation Labeling Processes III. Consistency IV. Overview of Results V. Average Local Consistency VI. Geometric Structure of Assignment Space VII. Maximizing Average Local Consistency VIII. The Relaxation Labeling Algorithm IX. A Local Convergence Result X. Generalizations to Higher Order Compatibilities XI. Comparisons with Standard Relaxation Labeling Updating Schemes XII. Summary and Conclusions Appendix A Motivation • Two concerns: – The decomposition of a complex computation into a network of simple “myopic”, or local, computations – The requisite use of context in resolving ambiguities Motivation • Relaxation operations: To solve systems of linear equations, etc. • Relaxation labeling: – Extension of relaxation operations – Solutions involve symbols rather than functions. – Assign weights attached to labels • Main difference: Labels do not necessarily have a natural ordering Motivation • Algorithm: – Parallel – Each process makes use of the context to assist in a labeling decision • Goal – Provide a formal foundation • Characterize of what the algorithm is doing to attribute the cause of failure to an inadequate theory Motivation • Treatment – Abstract • To relate discrete relaxation to a description of the usual relaxation labeling schemes • To develop a theory of consistency • To formalize its relationship to optimization • Several mathematical results I. Introduction to Labeling Problems • In a labeling problem, one is given: – – – – A set of objects A set of labels for each object A neighbor relation over the objects A constraint relation over labels at pairs (or n-tuples) of neighboring objects • Solution: An assignment of labels to each object in a manner which is consistent with respect to the constraint relation I. Introduction to Labeling Problems • λ: Variable to either denote a label or to serve as an index through a set of labels. • Λi : Set of labels attached to node i • Λij : Constraint relation listing all pairs (λ,λ’) such that λat i is consistent with λ’ at j • m : Number of labels in Λi • n : Number of nodes in G • Si (λ) : Support function for label λon i from a discrete labeling (count the number of neighbors of an object i which has labels compatible to a given label λat i) • Max used because more than one label can be 1 at j. I. Introduction to Labeling Problems • Discrete relaxation – label discarding rule: discard a label λat a node i if there exists a neighbor j of i such that every label λ’ currently assigned to j is incompatible with λ at i ( for all λ’ assigned to j). – A label is retained if at every neighboring node there exists at least one compatible label. II. Continuous Relaxation Labeling Processes • Limit in I: – Pairs of labels are either compatible or completely incompatible – Can’t express a preference or relative dislike • Solution: – Continuous relaxation labeling – Weighted values representing relative preferences II. Continuous Relaxation Labeling Processes • Compatibility rij(λ,λ’) : relative support for label λat object i that arises from label λ’ at object j. – Positive: locally consistent pair – Negative: implied inconsistency – Magnitude of rij(λ,λ’) is proportional to the strength of the constraint – i and j are not neighbors: rij(λ,λ’) = 0 II. Continuous Relaxation Labeling Processes • Difficulty: Formulating a consistent labeling – A consistent labeling is one in which the constraints are satisfied – Logical constraints replaced by weighted assertions: A new foundation is required to describe the structural framework and the precise meaning of the goal of consistency II. Continuous Relaxation Labeling Processes • Structural frameworks attempted: – Define consistency as the stopping points of algorithm • Circular, no clue – Regard the label weights as probabilities, use Bayesian analysis, statistical quantities, etc. • Unsuccessful, various independence assumptions required – Optimization theory: a vector composed of the current label weights, an evidence vector involving each label’s neighborhood weights • Authors extended it – Linear programming: constraints are obtained from arithmetical equivalents, preferences can be incorporated only by adding new labels • Different, interesting and not incompatible with authors’ development II. Continuous Relaxation Labeling Processes • Prototype (original) algorithm: – An iterative , parallel procedure analogous to the label discarding rule used in discrete relaxation – For each object and each label, one computes (as support function) using the current assignment values pi(λ). Then new assignment values are defined according to III. Consistency • Require a system of inequalities • Permit the logical constraints to be ordered, or weighted • Allow an analytic, rather than logical or symbolic, study • Definition of consistency: – For unambiguous labelings – For weighted labeling assignments III. Consistency • Unambiguous • Space of unambiguous labelings: labeling assignment: A mapping from the set of objects into the set of all labels, each object is associated with exactly one label III. Consistency • Weighted labeling assignments: replace by the condition • K is simply the convex hull of K* III. Consistency • Consistency depends on constraints between label numbers: the compatibility matrix, elements of which indicate both positive and negative constraints. • Definition 3.1: Labeling spaces require , so replace max with a sum in support function (linear) (refer to I) III. Consistency • Higher order combinations of object labels: – Multidimensional matrix of compatibilities: – Support at object i for label λ: • Definition 3.2: The unambiguous labeling consistent providing is • Consistency in K* corresponds to satisfying a system of inequalities: III. Consistency • At a consistent unambiguous labeling, the support, at each object, for the assigned label is the maximum support at that object. • Given a set of objects, labels, and support functions, there may be many consistent labelings. • Condition for consistency in K* (restate) III. Consistency • Definition 3.3: Condition for consistency for weighted labeling assignment • Definition 3.4: Condition for strictly consistency (for ) • An unambiguous assignment that is consistent in K will also be consistent in K*, since . The converse is also true (3.5). III. Consistency • Proposition 3.5: An unambiguous labeling which is consistent in K* is also consistent in K. IV. Overview of Results • Algorithm for converting a given labeling into a consistent one: – Two approaches: • Optimization theory • Finite variational calculus – Lead to the same algorithm • Achieving consistency is equivalent to solving a variational inequality: • IV. Overview of Results • Two paths to study consistency and derive algorithms for achieving it. V. Average Local Consistency • Goal: Update a nearly consistent labeling to a consistent one • should be large => should be large => • Average local consistency should be large. – Two problems: • Maximizing a sum doesn’t necessarily maximize each individual terms • The individual components si(λ) depend on , which varies during the maximization process. V. Average Local Consistency • Maximizing is the same as maximizing ,which is not the same as maximizing the n quantities • V. Average Local Consistency • Special case: the compatibility matrix is symmetric, maximizing leads to consistent labeling assignments. • General case: the compatibility matrix is not symmetric. VIII will figure out algorithm. – Locally maximizes symmetrized. is the same as if the matrix is V. Average Local Consistency • Gradient ascent: to find local maxima of a smooth functional , which successively move the current by a small step to a new . • The amount of increase in is related to the directional derivative of A in the direction of step. • The gradient : V. Average Local Consistency • When the compatibilities are symmetric: • • (cmp Dfn 3.1) : intermediate updating “direction” VI. Geometric Structure of Assignment Space • Goal: To discuss gradient ascent on K, and to visualize the more general updating algorithms. • A simple example: 2 (n) objects, with 3 (m) possible labels for each object (2 - simplex) VI. Geometric Structure of Assignment Space • Vector : two points, each lying in a copy of the space shown in Fig.2. • K: set of all pairs of points in two copies of the triangular space in Fig.2 • K with n objects each with m labels: – Space: n copies of an (m-1)-simplex – K: set of all n-tuples of points, each points lying in a copy of the (m-1)-dimensional surface – A weighted labeling assignment is a point in the assignment space K. – An unambiguous labeling: one of the “corners” – Each simplex has m corners VI. Geometric Structure of Assignment Space • Tangent space: A surface lies “tangent” to the entire surface if place it at the given point, means the set of all directions – K and tangent space are coincide when initiate – Interior of a surface: a vector space – Boundary of surface: a convex subset of a vector space VI. Geometric Structure of Assignment Space • : A labeling assignment in K • : Any other assignment in K • Difference vector (direction): VI. Geometric Structure of Assignment Space • Set of all tangent vectors at (surface)( around K): • Set of tangent vectors at the interior point consists of an entire subspace: roams VI. Geometric Structure of Assignment Space • lies on a boundary of K: a proper subset of above space: VII. Maximizing Average Local Consistency • To find a consistent labeling: – Constraints are symmetric: Gradient ascent – Constraint are not symmetric: same algorithm (VIII) – The increase in due to a small step of length αin the direction ū is approximately the directional derivative: ||u|| = 1 (the greatest increase in can be expected if a step is taken in the tangent direction ū) VII. Maximizing Average Local Consistency • To find direction of steepest ascent: grad should be maximized (solution always exists) • VII. Maximizing Average Local Consistency • Lemma 7.3: If lies in the interior of K, then the following algorithm solves problem 7.1 – May fail when in Appendix A) is a boundary point of K (solved using algorithm Appendix A. Updating Direction Algorithm • Give algorithm to replace the updating formulas in common use in relaxation labeling processes. • Give projection operator (a finite iterative algo) based on consistency theory and permitting proof of convergence results. • Solution to the projection problem: returned vector u. • Normalization: ||ū|| = 1 (or ū = 0) • Step length: αi VII. Maximizing Average Local Consistency • Algorithm 7.4: find consistent labelings when the matrix of compatibilities is symmetric – Successive iterates are obtained by moving a small step in the direction of the projection of the gradient – Algorithm stops when the projection =0 VII. Maximizing Average Local Consistency • Proposition 7.5: Suppose is a stopping point of Algo 7.4, then if the matrix of compatibilities is symmetric, is consistent. VIII. The Relaxation Labeling Algorithm • Previous entire analysis of average local consistency relies on the assumption of symmetric compatibilities. • Example: constraints between letters in English • Theorem 4.1 is general (variational inequality) VIII. The Relaxation Labeling Algorithm • • Observation 8.1 With defined as above, the variational inequality is equivalent to the statement A labeling is consistent iff points away from all tangent directions • Algorithm 8.2 (The Relaxation Labeling Algorithm) VIII. The Relaxation Labeling Algorithm • Proposition 8.3: suppose is a stopping point of Algo 8.2, then is consistent. • Questions: – Are there any consistent labeling for the relaxation labeling algorithm to find? (Answered by 8.4) – Assuming that such points exist, will the algorithm find them? (answered in IX) – Even if a relaxation labeling process converges to a consistent labeling, is the final labeling better than the initial assignment? (not well defined) VIII. The Relaxation Labeling Algorithm • Example of English • Proposition 8.4: The variational inequality of Theorem 4.1 always has at least one solution. Thus consistent labelings always exist, for arbitrary compatibility matrices. • Usually, more than one solution will exist. IX. A Local Convergence Result • As the step size of the relaxation labeling algorithm 7.4 or 8.2 becomes infinitesimal, these discrete algorithms approximate dynamical system • • Hypothesis of 9.1: the labeling at every object is close to the consistent assignment IX. A Local Convergence Result • Assume that is strictly consistent in order to prove that it’s a local attractor of the relaxation labeling dynamical system • If is consistent, but not strictly consistent, maybe: – A local attractor of the dynamical system – A saddle point – An unstable stopping point X. Generalizations to Higher Order Compatibilities • Consistency: be defined using support functions (depend on arbitrary orders of compatibilities): – 1-order compatibilities: – 3-order: • Symmetry condition: X. Generalizations to Higher Order Compatibilities – k-order compatibilities: • Symmetry condition: X. Generalizations to Higher Order Compatibilities • Compatibilities higher than second order, or nonpolynomial compatibilities: – Difficulty: combinatorial growth in the number of required computations – Most implementations of relaxation labeling processes have limited the computations to second-order compatibilities XI. Comparisons with Standard Relaxation Labeling Updating Schemes • Algo 8.2: Updates weighted labeling assignments: then updating in the direction defined by the projection of onto • Other two standard formulas for relaxation labeling: – XI. Comparisons with Standard Relaxation Labeling Updating Schemes – • Denominator is a normalization term • Numerator can be rewritten as: XII. Summary and Conclusions • Relaxation labeling processes: Mechanisms for employing context and constraints in labeling problems. • Background: Lacking a proper model characterizing the process and its stopping points, the choice of the coefficient values and the updating formula are subject only to empirical justification. • Achievement: Develop the foundations of a theory that figures consistency to explaining what relaxation labeling accomplishes, and leads to a relaxation algorithm with an updating formula using a projection operator. XII. Summary and Conclusions • Discrete relaxation: a label is discarded if it is not supported by the local context of assigned labels. • Weighted label assignment: An unambiguous labeling is consistent if the support for the instantiated label at each object is greater than or equal to the support for all other labels at that object. • Relaxation labeling process defined by Algo 8.2 with the projection operator specified in Appendix A stops at consistent labelings. • Dynamic process will converge to a consistent labeling if one begins sufficiently near a consistent. XII. Summary and Conclusions • Symmetry properties: relaxation labeling algorithm is equivalent to gradient ascent using average local consistency function. • Future work: – efficient implementations of the projection operator – Choice of the step size – Normalization methods Thank you