Matrix Completion with Queries Property of Natali Ruchansky Natali Ruchansky

advertisement
Property of Natali Ruchansky
Matrix Completion
with Queries
Natali Ruchansky, Mark Crovella, Evimaria Terzi
Property of Natali Ruchansky
Can you guess the picture?
Property of Natali Ruchansky
What about now?
3
Property of Natali Ruchansky
And now?
4
Salvador Domingo Felipe
Jacinto Dalí i Domènech
Property of Natali Ruchansky
5
Property of Natali Ruchansky
How did you do it?
Input Image
Available Information
Our Estimate
For most there is
too little information to
recognize shapes or patterns.
I’m not sure.
Arbitrary guess.
Recognize human features —
ear, eye brow shape, and
facial contour.
I know,
a human face.
(not Van Gogh)
I know this mustache!
My friend Salvador Dali!
6
Property of Natali Ruchansky
How Much and Which Information?
So the questions is, if we start at this image:
!
!
abracadabra
!
!
How much and which information do I need to add
so that my particular algorithm can infer the image?
Property of Natali Ruchansky
If we can answer:
How much and which information do I need to add
so that my particular algorithm can infer the image?
!
!
1. Choose which information to add, tailored to the
particular reconstruction algorithm.
!
2. Reconstruct based on this information.
Property of Natali Ruchansky
The example of reconstructing Dali is an instance of the
problem of Matrix Completion:
!
Given a partially-observed matrix M,
fill in the missing entires.
Property of Natali Ruchansky
In particular, the version applied to real world data is
Low Rank Matrix Completion:
!
Given a partially-observed matrix M of low rank r,
fill in the missing entires.
Property of Natali Ruchansky
Completion of what?
Property of Natali Ruchansky
Completion of what?
!
•
Yelp users rate restaurants
Property of Natali Ruchansky
Completion of what?
!
•
Yelp users rate restaurants
restaurants
users
But a given user has not
visited all restaurants …
So the matrix is
partially observed.
Property of Natali Ruchansky
Completion of what?
!
•
•
Yelp users rate restaurants
Traffic counters measure traffic on roads
Property of Natali Ruchansky
Completion of what?
!
•
•
Yelp users rate restaurants
Traffic counters measure traffic on roads
destination
source
But counters do not
exist on all roads …
So the matrix is
partially observed.
Property of Natali Ruchansky
Completion of what?
!
•
•
•
Yelp users rate restaurants
Cities can install traffic counters
Biologists measure interaction of proteins
!
Property of Natali Ruchansky
Completion of what?
!
•
•
•
Yelp users rate restaurants
Cities can install traffic counters
Biologists measure interaction of proteins
!
protein
protein
But they cannot exhaustively
run all experiments …
So the matrix is
partially observed.
Property of Natali Ruchansky
Completion of what?
!
•
•
•
Yelp users rate restaurants
Cities can install traffic counters
Biologists measure interaction of proteins
https://www.telegeography.com/telecom-maps/global-traffic-map.1.html
Property of Natali Ruchansky
Completion of what?
!
•
•
•
Yelp users rate restaurants
Cities can install traffic counters
Biologists measure interaction of proteins
!
!
And many more instance of partially observed data…
Property of Natali Ruchansky
Statistical Matrix Completion
Traditional approaches assume:
1. A random distribution of observations
2. At least n r log(n) observation
!
With these (at least) these assumptions, statistical matrix
completion methods pose the problem as an optimization and
find the best solution to match the visible information.
input meets assumptions
reconstruction
Property of Natali Ruchansky
Statistical Matrix Completion
Traditional approaches assume:
1. A random distribution of observations
2. At least n r log(n) observation
!
The challenge with these assumptions is that in real data:
1. The distribution is often not random
2. Very few entries are actually known.
Property of Natali Ruchansky
Statistical Matrix Completion
Traditional approaches assume:
1. A random distribution of observations
2. At least n r log(n) observation
!
The challenge with these assumptions is that in real data:
1. The distribution is often not random
2. Very few entries are actually known.
known ratings :
9e7
required n r log(n) : 2.5e8
≈160,000,000
fewer entries
Property of Natali Ruchansky
Statistical Matrix Completion
Traditional approaches assume:
1. A random distribution of observations
2. At least n r log(n) observation
!
The challenge with these assumptions is that in real data:
1. The distribution is often not random
2. Very few entries are actually known.
match on Ω,
not elsewhere
real observed data
best guess
Property of Natali Ruchansky
Our Question
.
+
+
=
!
!
How can we design one
querying and matrix completion
algorithm, that minimizes the
reconstruction error and number of queries ?
!
!
We call this the Active Completion problem.
Property of Natali Ruchansky
Our Question
.
+
+
=
!
!
How can we design one
querying and matrix completion
algorithm, that minimizes the
reconstruction error and number of queries ?
!
1
2
!
We call this the Active Completion problem.
Property of Natali Ruchansky
Our Question
.
+
+
=
!
!
How can we design one
querying and matrix completion
algorithm, that minimizes the
reconstruction error and number of queries ?
1
!
fixed to budget b
!
We call this the Active Completion problem.
Property of Natali Ruchansky
With great power…
Many data owners are in the powerful position to add
additional observations:
!
Property of Natali Ruchansky
With great power…
Many data owners are in the powerful position to add
additional observations:
!
•
Yelp can ask some users to rate some restaurants
Property of Natali Ruchansky
With great power…
Many data owners are in the powerful position to add
additional observations:
!
•
•
Yelp can ask some users to rate some restaurants
Cities can install traffic counters
Property of Natali Ruchansky
With great power…
Many data owners are in the powerful position to add
additional observations:
!
•
•
•
Yelp can ask some users to rate some restaurants
Cities can install traffic counters
Biologists can experiment with a particular protein pair
Property of Natali Ruchansky
With great power…
Many data owners are in the powerful position to add
additional observations:
!
•
•
•
Yelp can ask some users to rate some restaurants
Cities can install traffic counters
Biologists can experiment with a particular protein pair
Property of Natali Ruchansky
With great power…
Many data owners are in the powerful position to add
additional observations:
!
•
•
•
Yelp can ask some users to rate some restaurants
Cities can install traffic counters
Biologists can experiment with a particular protein pair
!
How to make the most use of the limited budget of queries?
Property of Natali Ruchansky
The Answer
We construct an algorithm called Order&Extend
that is the first to integrate a querying strategy into
its matrix completion algorithm.
!
!
Able to select a small number of queries
needed to find an accurate completion.
Property of Natali Ruchansky
Our Approach
The key to our approach is viewing matrix completion through a
sequence of linear systems.
!
This allows us to identify:
1. Parts of the matrix that can be recovered given the observations
2. Other parts that cannot due to insufficient information
3. The additional entries needed to recover those areas.
!
!
Note this means our algorithm will not do this:
It will only estimate the parts it can.
Property of Natali Ruchansky
MC as Linear Systems
m
m
n
M
r
=
n
r
Y
X
Write the data M = XY as a product of factors.
Property of Natali Ruchansky
MC as Linear Systems
m
m
n
M
r
=
n
X
r
Y
Property of Natali Ruchansky
MC as Linear Systems
Property of Natali Ruchansky
MC as Linear Systems
yj
xi
xi’
for rank 2 :
Mij
Mi’j
Mij = xi1y1j + xi2y2j
Mi’j = xi’1y1j + xi’2y2j
Property of Natali Ruchansky
yj
﹖
xi
xi’
known
Mij
Mi’j
Mij = xi1y1j + xi2y2j
Mi’j = xi’1y1j + xi’2y2j
unknown
Two equations in two variables
Property of Natali Ruchansky
yj
xi
xi’
Mij
Mi’j
Mij = xi1y1j + xi2y2j
Mi’j = xi’1y1j + xi’2y2j
solve for y
Property of Natali Ruchansky
yj
Iteratively solve
systems of this form
xi
xi’
Mij
Mi’j
M = xi1y1j + xi2y2j
M = xi’1y1j + xi’2y2j
fill in X and Y,
then multiply to get the
~
estimate M=XY.
Property of Natali Ruchansky
How do we know when and
what we need to query?
42
Property of Natali Ruchansky
Incomplete Systems
﹖
xi
xi’
known
Mij
Mi’j
Mij = xi1y1j + xi2y2j
Mi’j = xi’1y1j + xi’2y2j
unknown
Two equations in two variables
Property of Natali Ruchansky
Incomplete Systems
﹖
xi
xi’
﹖
known
Mij
Mi’j
Mij = xi1y1j + xi2y2j
Mi’j = xi’1y1j + xi’2y2j
Mi’j was not observed in the input data.
Two equations in three variables
unknown
Property of Natali Ruchansky
Incomplete Systems
﹖
xi
xi’
﹖
known
Mij
Mi’j
Mij = xi1y1j + xi2y2j
Mi’j = xi’1y1j + xi’2y2j
unknown
Query: what is the value of Mi’j ?
Property of Natali Ruchansky
Incomplete Systems
﹖
xi
xi’
known
Mij
Mi’j
Mij = xi1y1j + xi2y2j
Mi’j = xi’1y1j + xi’2y2j
unknown
Two equations in two unknowns,
so we can solve for y.
Property of Natali Ruchansky
Unstable systems
X
y = M
1
1/2
1/2
1/3
1
1/2
1/2
1/3
1
2
y =
y’ =
3/2
1
3/2
5/6
Property of Natali Ruchansky
Unstable systems
X
y = M
1
1/2
1/2
1/3
1
1/2
1/2
1/3
y =
y’ =
3/2
1
3/2
5/6
Property of Natali Ruchansky
Unstable systems
X
y = M
1
1/2
1/2
1/3
1
1/2
1/2
1/3
y =
y’ =
3/2
y =
0
1
3
3/2
1
5/6
y’ =
1
Property of Natali Ruchansky
Unstable systems
In the paper…
!
1. How can we detect unstable systems?
!
!
2. How mitigate unstable systems?
Property of Natali Ruchansky
Minimizing Queries
Encountering an incomplete
or unstable system
Algorithm needs
to query.
Property of Natali Ruchansky
Minimizing Queries
Encountering an incomplete
or unstable system
Algorithm needs
to query.
How can we also keep
the number of queries asked to a minimum?
Property of Natali Ruchansky
Minimizing Queries
Encountering an incomplete
or unstable system
Algorithm needs
to query.
How can we also keep
the number of queries asked to a minimum?
!
By manipulating the order in which we solve the systems.
(Hence the ‘order’ in Order&Extend)
Property of Natali Ruchansky
Takeaway
Observed data is typically:
- not random
- sparse
…But we can query!
(minimally!)
+
+
=
estimate
Property of Natali Ruchansky
Option 1: Independent
Decide what to query independently of how you complete.
+
Query Limit = 1
=
Property of Natali Ruchansky
Option 2: Integrated
Decide what to query based on of how you complete.
Who is guessing?
normal person
+
=
an artist
+
Property of Natali Ruchansky
!
Our algorithm Order&Extend is the first one composed of
1. a querying strategy
tailored to
2. a completion algorithm
!
!
This integrated nature enables Order&Extend to :
- carefully select a small number of queries,
so that the completion algorithm can
- recover the matrix with high accuracy.
!
!
And allows it to output partial completions for strict limits
of the number allotted of queries.
Property of Natali Ruchansky
A Flavor
(of internet traffic data)
For full and accurate
completion,
Order&Extend
asks 13k queries
…while
other algorithms
do not achieve
comparable error
even with <40k queries
Property of Natali Ruchansky
Deeper discussion of:
• Matrix completion as a sequence of linear systems
• Sequence of linear systems as graph propagation
• Predicting unstable systems
• distinction from ill-condition
• Efficient computation of stability checks
• Finding a good solving-order
• through the lens of graph propagation
!
Experiments:
• Comparison with Matrix Completion algorithms
• extended with a querying ability
• Approximate low-rank
• Exact low-rank
Read the paper!
Property of Natali Ruchansky
Thank you.
(and read the paper)
from the book Dali’s Mustache
Download