(SparseSampling).

advertisement
Sparse Representation, Building
Dictionaries, and Church Street
Lily Chan
fully sampled
6X undersampled
6X undersampled with CS
reconstruction
Overview
A.
Basic Compressed Sensing Theory
B.
Building Good Dictionaries
C1. Background Subtraction
C2. Estimating Crowd Size
A. Basic Compressed Sensing Theory
Compressed Sensing
Concepts from multiple academic disciplines
–
–
–
–
–
Linear Algebra and Systems
Statistics and Probability
Signals and Systems
Computer Science
Mathematics
Compressed Sensing
Concepts from multiple academic disciplines
–
–
–
–
–
Linear Algebra and Systems
Statistics and Probability
Signals and Systems
Computer Science
Mathematics
Motivations for CS
–
–
–
–
–
Faster sampling
Larger dynamic range
Higher-dimensional data
Lower energy consumption
New sensing modalities
Compressed Sensing
Concepts from multiple academic disciplines
–
–
–
–
–
Linear Algebra and Systems
Statistics and Probability
Signals and Systems
Computer Science
Mathematics
Motivations for CS
–
–
–
–
–
Faster sampling
Larger dynamic range
Higher-dimensional data
Lower energy consumption
New sensing modalities
Applications
–
–
–
–
Photography
Infrared Cameras
Facial Recognition
Pediatric MRI (Time reduced by
~10x)
– Etc.
Compressed Sensing
• Compressive Sensing is based on the observation that many
real-world signals and images are either sparse themselves or
sparse in some basis or frame (i.e. compressible).
Compressed Sensing
• Compressive Sensing is based on the observation that many
real-world signals and images are either sparse themselves or
sparse in some basis or frame (i.e. compressible).
• Acquires and reconstructs signals using a mathematical theory
focused on measuring finite-dimensional signals in Rn.
Compressed Sensing
• Compressive Sensing is based on the observation that many
real-world signals images are either sparse themselves or
sparse in some basis or frame (i.e. compressible).
• Acquires and reconstructs signals using a mathematical theory
focused on measuring finite-dimensional signals in Rn.
• Enables data to be directly sensed in compressed form (lower
sampling rate than traditional Nyquist), providing a sparse or
compressible representation for signals.
Compressed Sensing
In CS we seek to recover an nx1
vector x given m measurements y,
with m << n and a dictionary A.
y = Ax
Ax
Compressed Sensing
y=
A
x
A
A
A
A
A
A
A
P0
A
L0 Minimization (L0 Norm)
•
The L0 norm returns the number of nonzero elements in each potential solution.
•
Finding the sparsest solution (solution with the least number of nonzero elements) to
the system by minimizing the L0 norm is the exact result desired for our system.
•
Though this method sounds straightforward, it is very expensive to use and requires
analysis of all possible arrangements of the k nonzero elements of the signal.
» Very impractical
Sparsity and the L1 Norm
•
Norms measure the strength of a signal (size of the error / residual of the system)
• The goal is to find the x* A that minimizes x-x*p, which is the
approximation error using an p norm.
• The larger p is, the more evenly spread out the error is among the two
coefficients.
• Goal: Obtain the sparsest approximation of a point in 2-D space by a point in
1-D subspace.
• L1 provides the most practical sparsest approximation next to L0.
A
CS Software available
Open source software is now available for many applications of different CS
methods.
– Most of this software is written in C/C++ and Matlab.
– L1-magic is a popular Matlab-based collection of CS algorithms based on
standard interior-point methods.
– Other software available include
• NESTA
• TFOCS
• SURE for Matrix Estimation
• CurveLab
• ChirpLab
• SPARCO
• TWIST
• SparseLab
• etc …
SMALLbox
•
SMALLbox: Sparse Models, Algorithms and Learning for Large-scale data
•
Purpose: To provide a unifying interface that enables an easy way of
comparing dictionary learning algorithms through an API that enables
interoperability between existing toolboxes.
•
Current Functional Examples
–
–
–
–
•
Image Denoising (with comparisons of different algorithms)
Automatic Music Transcription
Representation of image with patches from another one (Pierre Villars)
Incoherent Dictionary Learning
Download SMALLbox at: https://code.soundsoftware.ac.uk/projects/smallbox
A
Image Denoising Example
Denoising Problem:
KSVD Denoised Image, PSNR = 32.35 dB,
Time = 8.24 s
Given N noisy measurements,
y =Ax+v,
build dictionary A and recover x.
y = Ax + v,
where v is noise.
RLSDLA Denoised Image, PSNR = 32.38 dB,
Time = 7.60 s
Denoising Flow Chart
Generate
initial
dictionary A
Note: The dictionary update
state is done one atom
(column) at a time. Other nonzero data samples that do not
use the atom (non-orthogonal
to the atom) are fixed.
Update
Dictionary A
(Dictionary
Learning)
Denoise by
orthogonal
pursuit
(Patch
Denoising)
Image reconstruction
Image Denoising Results
KSVD vs. RLSDLA
Original
Noisy Image, PSNR = 22.23 dB
KSVD Denoised Image, PSNR = 32.35 dB,
Time = 8.24 s
RLSDLA Denoised Image, PSNR = 32.38 dB,
Time = 7.60 s
KSVD Dictionary
RLSDLA Dictionary
C1. Background Subtraction
M
M
Background Subtraction
Under rather weak assumptions,
the Principal Component Pursuit (PCP) estimate
solving
exactly recovers the low-rank L0 and the sparse
S0.*
* Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component
Analysis”, Journal of the ACM, volume 58, no. 3, May 2011.
Background Subtraction
• If we stack the video frames as columns of a matrix M, then the
low-rank component L0 naturally corresponds to the stationary
background and the sparse component S0 captures the moving
objects in the foreground.
Background Subtraction
• If we stack the video frames as columns of a matrix M, then the
low-rank component L0 naturally corresponds to the stationary
background and the sparse component S0 captures the moving
objects in the foreground.
• Foreground objects, such as cars or pedestrians, generally
occupy only a fraction of the image pixels and hence can be
treated as sparse errors.
Background Subtraction
Background Subtraction
An augmented Lagrange multiplier (ALM) algorithm is used in the
TFOCS toolbox to solve the convex PCP problem.*
* Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component
Analysis”, Journal of the ACM, volume 58, no. 3, May 2011.
Background Subtraction
Background Subtraction
• ALM achieves much higher accuracy than APG (Accelerated
Proximal Gradient, in fewer iterations. *
• It works stably across a wide range of problem settings with no
tuning of parameters. *
• ALM has an appealing (empirical) property: the rank of the iterates
often remains bounded by rank(L0) throughout the optimization,
allowing them to be computed especially efficiently. APG, on the
other hand, does not have this property. *
* Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component
Analysis”, Journal of the ACM, volume 58, no. 3, May 2011.
C2. Estimating Crowd Size
Using Background Subtraction
•
Objective: Estimate the number of objects passing through a video
•
Video Locations
– UVM Davis Center
– Church Street Marketplace
•
Total Video Time Analyzed: 119 minutes
•
Total Actual Objects in all videos analyzed: 2638
•
Concepts Used
– Compressed Sensing
• Dictionary Learning
• Background Subtraction
– Kalman Filters for object tracking
•
Toolboxes
– TFOCS (Templates for First-Order Conic Solvers )
– Computer Vision System Toolbox from MATLAB
Estimating Crowd Size
Using Background Subtraction
Estimating Crowd Size
Using Background Subtraction
Automatic Object Counter
100
90
Percent Accurate
80
70
60
Analysis without BS
50
Analysis with BS
40
30
20
10
0
UVM_1
UVM_3
UVM_4
UVM_5
UVM_6
Video
UVM_7
L1020306
L1020307
Estimating Crowd Size
Using Background Subtraction
•
Background Subtraction significantly increases counting accuracy in videos
with background objects that are constantly moving:
– Natural Environments with unpredictable factors
– Trees
– Escalators
•
If an object (or a group of objects) enters the video but stops moving, the
algorithm will eventually count them as part of the background after a few
frames until they start moving again, at which point they will be considered a
new object.
•
If a group of people walk at the same pace and travel in a tight pack, the current
program will consider them one big object travelling through the video.
•
Tracking accuracy is greatly improved when there are less inanimate objects in
the video that could provide occlusion for the moving objects.
•
There is currently no commercial technology available to count large crowds
that is reliably accurate as of yet.
Estimating Crowd Size Using Background
Subtraction :
How to Run the program
The current automatic object counter is designed to analyze a folder of videos
and output a comma-separated value file with the name of each video and the
count from the analysis.
Steps:
1)
2)
3)
4)
5)
Install the TFOCS toolbox onto your computer:
http://cvxr.com/tfocs/
Run AutoObjectCounter.m
Choose the folder to be analyzed
The analysis takes about 1 minute per second of video analyzed for
.avi formatted videos.
Once the analysis is complete, a VidCountRslts.csv file will be in the
folder from step 3 containing the names of the videos in the folder
with the corresponding counts of each video.
Crowd Estimation Lessons Learned
•
Video accuracy is best when the video taken is stable, hence a tripod is highly
recommended.
•
Taking video using a digital camera with .avi format output takes less memory,
has faster processing time, and is easier to convert than using an iphone with
.mov format output.
•
Ensure the computer being used for processing has at least 8GB of RAM.
•
Video segments longer than about 25 seconds may crash Matlab and your OS,
depending on the individual processing power of the computer.
•
Recommended video segment time is between 10 to 20 seconds.
•
Shorter video segments allow for easier manual counting of moving objects.
•
Talk to the mall administrators before taking videos inside the Church Street
mall, otherwise the mall police will kick you out. Be discreet about taking
videos, some people may become aggressive if they find you recording them.
Future Improvements
• Coding Efficiency
– Improving the Matlab code for efficiency would save computing
time and potentially allow for longer video segments without
crashing the computer or requiring large amounts of processing
power.
• Integrate Feature Recognition
– The tracking of people would be more accurate for crowds if
feature recognition was integrated to enable tracking of
individual people instead of blobs.
• Frame to Frame Shading Stabilization
– Stabilization of background color and shading of the video from
frame to frame would eliminate false counts.
References
Papers
•
Candès E., “Compressive Sampling,” Proceedings of International Congress of Mathematicians, Madrid,
Spain, 2006.
•
Fornasier M., and Rauhut H., “ Compressive sensing,” Handbook of Mathematical Methods in
Imaging. , Springer , Heidelberg, Germany , ((2011)).
•
J. Wright, et al., “Robust Face Recognition via Sparse Representation”, IEEE TRANS. PAMI, Mar 2006.
•
M.A. Davenport, M.F. Duarte, Y.C. Eldar, and G. Kutyniok, “Introduction to Compressed Sensing,”
Compressed Sensing: Theory and Applications, Cambridge University Press, 2012.
•
D. Barchiesi and M. Plumbley, “Learning Incoherent Dictionaries for Sparse Approximation Using Iterative
Projections and Rotations,” IEEE Trans. Signal Process., vol. 61, no. 8, pp. 2065, Apr. 2013.
•
Y. Zhang, “Theory of Compressive Sensing via L1 Minimization: A Non-Rip Analysis and Extensions,”
Rice University, Houston, TX, Tech. Rep., 2008.
•
I. Ram, M. Elad, and I. Cohen, “The RTBWT Frame – Theory and Use for Images”, working draft to be
submitted soon.
References
Papers
•
Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrange multiplier method for exact recovery of a
corrupted low-rank matrices. Mathematical Programming, submitted, 2009.
•
Donoho, D.L.: Compressed Sensing. IEEE Trans. Info. Theory 52(4) (2006) 1289–1306
•
I. Ram, M. Elad, and I. Cohen, “Image Processing using Smooth Ordering of its Patches”, to appear in
IEEE Transactions on Image Processing.
•
M. Elad, “Sparse and Redundant Representation Modeling — What Next?”, IEEE Signal Processing
Letters, Vol. 19, No. 12, Pages 922-928, December 2012.
•
A.M. Bruckstein, D.L. Donoho, and M. Elad, “From Sparse Solutions of Systems of Equations to Sparse
Modeling of Signals and Images”, SIAM Review, Vol. 51, No. 1, Pages 34-81, February 2009.
•
Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component Analysis”, Journal of the ACM,
volume 58, no. 3, May 2011.
References
Resources
•
Compressed Sensing Audio Demonstration:
http://sunbeam.ece.wisc.edu/csaudio/
•
SMALLbox:
https://code.soundsoftware.ac.uk/projects/smallbox
•
Compressed Sensing Video Lectures
–
Low-rank modeling
–
Matrix Completion via Convex Optimization:
Theory and Algorithms
–
An Overview of Compressed Sensing and
Sparse Signal Recovery via L1 minimization
–
L1 Minimization
–
Basics of probability and statistics for Machine Learning
http://videolectures.net/mlss2011_candes_lowrank/
http://videolectures.net/mlss09us_candes_mccota/
http://videolectures.net/mlss09us_candes_ocsssrl1m/
http://videolectures.net/nips09_bach_smm/
http://videolectures.net/bootcamp07_keller_bss/
•
Least Squares Estimates:
http://www.khanacademy.org
•
Compressive Sensing Resources:
http://dsp.rice.edu/cs
•
TFOCS Toolbox:
http://cvxr.com/tfocs/
•
Computer Vision System Toolbox from MATLAB
http://www.mathworks.com/products/computer-vision/
Download