Nonnegative Matrix Factorization with Sparseness Constraints

advertisement
Nonnegative Matrix Factorization
with Sparseness Constraints
S. Race
MA591R
Introduction to NMF
 Factor A = WH
 A – matrix of data
 m non-negative scalar variables
 n measurements form the columns of A
 W – m x r matrix of “basis vectors”
 H – r x n coefficient matrix
 Describes how strongly each building
block is present in measurement vectors
Introduction to NMF con’t
 Purpose:
 “parts-based” representation of the data
 Data compression
 Noise reduction
 Examples:
 Term – Document Matrix
 Image processing
 Any data composed of hidden parts
Introduction to NMF con’t
 Optimize accuracy of solution:
 min || A-WH ||F where W,H ≥ 0
 We can drop nonnegative constraints
 min || A-(W.W)(H.H) ||
 Many options for objective function
 Many options for algorithm
 W,H will depend on initial choices
 Convergence is not always guaranteed
Common Algorithms
 Alternating Least Squares
 Paatero 1994
 Multiplicative Update Rules
 Lee-Seung 2000 Nature
 Used by Hoyer
 Gradient Descent
 Hoyer 2004
 Berry-Plemmons 2004
Why is sparsity important?
 Nature of some data
 Text-mining
 Disease patterns
 Better Interpretation of Results
 Storage concerns
Non-negative Sparse Coding I
 Proposed by Patrik Hoyer in 2002
 Add a penalty function to the
objective to encourage sparseness
 OBJ:
 Parameter λ controls trade-off
between accuracy and sparseness
 f is strictly increasing: f=Σ Hij works
Sparse Objective Function
 The objective can always be
decreased by scaling W up, H down
 Set W= cW and H=(1/c)H
 Thus, alone the objective will simply
yield the NMF solution
 Constraint on the scale of H or W is
needed
 Fix norm of columns of W or rows of H
Non-negative Sparse Coding I
 Pros
 Simple, efficient
 Guaranteed to reach global minimum
using multiplicative update rule
 Cons
 Sparseness controlled implicitly:
Optimal λ found by trial and error
 Sparseness only constrained for H
NMF with sparseness constraints II
 First need some way to define the
sparseness of a vector
 A vector with one nonzero entry is
maximally sparse
 A multiple of the vector of all ones, e,
is minimally sparse
 CBS Inequality
 How can we combine these ideas?
Hoyer’s Sparseness Parameter
 sparseness(x)=
 where n is the dimensionality of x
 This measure indicates that we can
control a vector’s sparseness by
manipulating its L1 and L2 norms
Picture of Sparsity function for vectors w/ n=2
Implementing Sparseness Constraints
 Now that we have an explicit measure
of sparseness, how can we
incorporate it into the algorithm?
 Hoyer: at each step, project each
column of a matrix onto the nearest
vector of desired sparseness.
Hoyer’s Projection Algorithm
 Problem: Given any vector, x, find the
closest (in the Euclidean sense) nonnegative vector s with a given L1
norm and a given L2 norm
 We can easily solve this problem in
the 3 dimensional case and extend
the result.
Hoyer’s Projection Algorithm
 Set si=xi + (L1-Σxi)/n for all i
 Set Z={}
 Iterate







Set mi=L1/(n-size(Z)) if i in Z, 0 otherwise
Set s=m+β(s-m) where β≥0 solves quadratic
If s, non-negative we’re finished
Set Z=Z U {i : si <0}
Set si=0 for all i in Z
Calculate c=(Σsi – L1)/(n-size(Z))
Set si=si-c for all i not in Z
The Algorithm in words
 Project x onto hyperplane Σsi=L1
 Within this space, move radially outward
from center of joint constraint hypersphere
toward point
 If result non-negative, destination reached
 Else, set negative values to zero and
project to new point in similar fashion
NMF with sparseness constraints
 Step 1: Initialize W, H to random
positive matrices
 Step 2: If constraints apply to W or H
or both, project each column or row
respectively to have unchanged L2
norm and desired L1 norm
NMF w/ Sparseness Algorithm
 Step 3: Iterate
 If sparseness constraints on W apply,
 Set W=W-μw(WH-A)HT
 Project columns of W as in step 2
 Else, take standard multiplicative step
 If sparseness constraints on H apply
 Set H=H- μHWT(WH-A)
 Project rows of H as in step 2
 Else, take standard multiplicative step
Advantages of New Method
 Sparseness controlled explicitly with a
parameter that is easily interpretted
 Sparseness of W, H or both can be
constrained
 Number of iterations required grows
very slowly with the dimensionality of
the problem
Dotted Lines
Represent Min and
Max Iterations
Solid Line shows
average number
required
An Example from Hoyer’s Work
Text Mining Results
 Text to Matrix Generator
 Dimitrios Zeimpekis and E. Gallopoulos
 University of Patras
 http://scgroup.hpclab.ceid.upatras.gr/sc
group/Projects/TMG/
 NMF with sparseness constraints from
Hoyer’s web page
Download