Sparse Representation, Building Dictionaries, and Church Street Lily Chan fully sampled 6X undersampled 6X undersampled with CS reconstruction Overview A. Basic Compressed Sensing Theory B. Building Good Dictionaries C1. Background Subtraction C2. Estimating Crowd Size A. Basic Compressed Sensing Theory Compressed Sensing Concepts from multiple academic disciplines – – – – – Linear Algebra and Systems Statistics and Probability Signals and Systems Computer Science Mathematics Compressed Sensing Concepts from multiple academic disciplines – – – – – Linear Algebra and Systems Statistics and Probability Signals and Systems Computer Science Mathematics Motivations for CS – – – – – Faster sampling Larger dynamic range Higher-dimensional data Lower energy consumption New sensing modalities Compressed Sensing Concepts from multiple academic disciplines – – – – – Linear Algebra and Systems Statistics and Probability Signals and Systems Computer Science Mathematics Motivations for CS – – – – – Faster sampling Larger dynamic range Higher-dimensional data Lower energy consumption New sensing modalities Applications – – – – Photography Infrared Cameras Facial Recognition Pediatric MRI (Time reduced by ~10x) – Etc. Compressed Sensing • Compressive Sensing is based on the observation that many real-world signals and images are either sparse themselves or sparse in some basis or frame (i.e. compressible). Compressed Sensing • Compressive Sensing is based on the observation that many real-world signals and images are either sparse themselves or sparse in some basis or frame (i.e. compressible). • Acquires and reconstructs signals using a mathematical theory focused on measuring finite-dimensional signals in Rn. Compressed Sensing • Compressive Sensing is based on the observation that many real-world signals images are either sparse themselves or sparse in some basis or frame (i.e. compressible). • Acquires and reconstructs signals using a mathematical theory focused on measuring finite-dimensional signals in Rn. • Enables data to be directly sensed in compressed form (lower sampling rate than traditional Nyquist), providing a sparse or compressible representation for signals. Compressed Sensing In CS we seek to recover an nx1 vector x given m measurements y, with m << n and a dictionary A. y = Ax Ax Compressed Sensing y= A x A A A A A A A P0 A L0 Minimization (L0 Norm) • The L0 norm returns the number of nonzero elements in each potential solution. • Finding the sparsest solution (solution with the least number of nonzero elements) to the system by minimizing the L0 norm is the exact result desired for our system. • Though this method sounds straightforward, it is very expensive to use and requires analysis of all possible arrangements of the k nonzero elements of the signal. » Very impractical Sparsity and the L1 Norm • Norms measure the strength of a signal (size of the error / residual of the system) • The goal is to find the x* A that minimizes x-x*p, which is the approximation error using an p norm. • The larger p is, the more evenly spread out the error is among the two coefficients. • Goal: Obtain the sparsest approximation of a point in 2-D space by a point in 1-D subspace. • L1 provides the most practical sparsest approximation next to L0. A CS Software available Open source software is now available for many applications of different CS methods. – Most of this software is written in C/C++ and Matlab. – L1-magic is a popular Matlab-based collection of CS algorithms based on standard interior-point methods. – Other software available include • NESTA • TFOCS • SURE for Matrix Estimation • CurveLab • ChirpLab • SPARCO • TWIST • SparseLab • etc … SMALLbox • SMALLbox: Sparse Models, Algorithms and Learning for Large-scale data • Purpose: To provide a unifying interface that enables an easy way of comparing dictionary learning algorithms through an API that enables interoperability between existing toolboxes. • Current Functional Examples – – – – • Image Denoising (with comparisons of different algorithms) Automatic Music Transcription Representation of image with patches from another one (Pierre Villars) Incoherent Dictionary Learning Download SMALLbox at: https://code.soundsoftware.ac.uk/projects/smallbox A Image Denoising Example Denoising Problem: KSVD Denoised Image, PSNR = 32.35 dB, Time = 8.24 s Given N noisy measurements, y =Ax+v, build dictionary A and recover x. y = Ax + v, where v is noise. RLSDLA Denoised Image, PSNR = 32.38 dB, Time = 7.60 s Denoising Flow Chart Generate initial dictionary A Note: The dictionary update state is done one atom (column) at a time. Other nonzero data samples that do not use the atom (non-orthogonal to the atom) are fixed. Update Dictionary A (Dictionary Learning) Denoise by orthogonal pursuit (Patch Denoising) Image reconstruction Image Denoising Results KSVD vs. RLSDLA Original Noisy Image, PSNR = 22.23 dB KSVD Denoised Image, PSNR = 32.35 dB, Time = 8.24 s RLSDLA Denoised Image, PSNR = 32.38 dB, Time = 7.60 s KSVD Dictionary RLSDLA Dictionary C1. Background Subtraction M M Background Subtraction Under rather weak assumptions, the Principal Component Pursuit (PCP) estimate solving exactly recovers the low-rank L0 and the sparse S0.* * Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component Analysis”, Journal of the ACM, volume 58, no. 3, May 2011. Background Subtraction • If we stack the video frames as columns of a matrix M, then the low-rank component L0 naturally corresponds to the stationary background and the sparse component S0 captures the moving objects in the foreground. Background Subtraction • If we stack the video frames as columns of a matrix M, then the low-rank component L0 naturally corresponds to the stationary background and the sparse component S0 captures the moving objects in the foreground. • Foreground objects, such as cars or pedestrians, generally occupy only a fraction of the image pixels and hence can be treated as sparse errors. Background Subtraction Background Subtraction An augmented Lagrange multiplier (ALM) algorithm is used in the TFOCS toolbox to solve the convex PCP problem.* * Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component Analysis”, Journal of the ACM, volume 58, no. 3, May 2011. Background Subtraction Background Subtraction • ALM achieves much higher accuracy than APG (Accelerated Proximal Gradient, in fewer iterations. * • It works stably across a wide range of problem settings with no tuning of parameters. * • ALM has an appealing (empirical) property: the rank of the iterates often remains bounded by rank(L0) throughout the optimization, allowing them to be computed especially efficiently. APG, on the other hand, does not have this property. * * Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component Analysis”, Journal of the ACM, volume 58, no. 3, May 2011. C2. Estimating Crowd Size Using Background Subtraction • Objective: Estimate the number of objects passing through a video • Video Locations – UVM Davis Center – Church Street Marketplace • Total Video Time Analyzed: 119 minutes • Total Actual Objects in all videos analyzed: 2638 • Concepts Used – Compressed Sensing • Dictionary Learning • Background Subtraction – Kalman Filters for object tracking • Toolboxes – TFOCS (Templates for First-Order Conic Solvers ) – Computer Vision System Toolbox from MATLAB Estimating Crowd Size Using Background Subtraction Estimating Crowd Size Using Background Subtraction Automatic Object Counter 100 90 Percent Accurate 80 70 60 Analysis without BS 50 Analysis with BS 40 30 20 10 0 UVM_1 UVM_3 UVM_4 UVM_5 UVM_6 Video UVM_7 L1020306 L1020307 Estimating Crowd Size Using Background Subtraction • Background Subtraction significantly increases counting accuracy in videos with background objects that are constantly moving: – Natural Environments with unpredictable factors – Trees – Escalators • If an object (or a group of objects) enters the video but stops moving, the algorithm will eventually count them as part of the background after a few frames until they start moving again, at which point they will be considered a new object. • If a group of people walk at the same pace and travel in a tight pack, the current program will consider them one big object travelling through the video. • Tracking accuracy is greatly improved when there are less inanimate objects in the video that could provide occlusion for the moving objects. • There is currently no commercial technology available to count large crowds that is reliably accurate as of yet. Estimating Crowd Size Using Background Subtraction : How to Run the program The current automatic object counter is designed to analyze a folder of videos and output a comma-separated value file with the name of each video and the count from the analysis. Steps: 1) 2) 3) 4) 5) Install the TFOCS toolbox onto your computer: http://cvxr.com/tfocs/ Run AutoObjectCounter.m Choose the folder to be analyzed The analysis takes about 1 minute per second of video analyzed for .avi formatted videos. Once the analysis is complete, a VidCountRslts.csv file will be in the folder from step 3 containing the names of the videos in the folder with the corresponding counts of each video. Crowd Estimation Lessons Learned • Video accuracy is best when the video taken is stable, hence a tripod is highly recommended. • Taking video using a digital camera with .avi format output takes less memory, has faster processing time, and is easier to convert than using an iphone with .mov format output. • Ensure the computer being used for processing has at least 8GB of RAM. • Video segments longer than about 25 seconds may crash Matlab and your OS, depending on the individual processing power of the computer. • Recommended video segment time is between 10 to 20 seconds. • Shorter video segments allow for easier manual counting of moving objects. • Talk to the mall administrators before taking videos inside the Church Street mall, otherwise the mall police will kick you out. Be discreet about taking videos, some people may become aggressive if they find you recording them. Future Improvements • Coding Efficiency – Improving the Matlab code for efficiency would save computing time and potentially allow for longer video segments without crashing the computer or requiring large amounts of processing power. • Integrate Feature Recognition – The tracking of people would be more accurate for crowds if feature recognition was integrated to enable tracking of individual people instead of blobs. • Frame to Frame Shading Stabilization – Stabilization of background color and shading of the video from frame to frame would eliminate false counts. References Papers • Candès E., “Compressive Sampling,” Proceedings of International Congress of Mathematicians, Madrid, Spain, 2006. • Fornasier M., and Rauhut H., “ Compressive sensing,” Handbook of Mathematical Methods in Imaging. , Springer , Heidelberg, Germany , ((2011)). • J. Wright, et al., “Robust Face Recognition via Sparse Representation”, IEEE TRANS. PAMI, Mar 2006. • M.A. Davenport, M.F. Duarte, Y.C. Eldar, and G. Kutyniok, “Introduction to Compressed Sensing,” Compressed Sensing: Theory and Applications, Cambridge University Press, 2012. • D. Barchiesi and M. Plumbley, “Learning Incoherent Dictionaries for Sparse Approximation Using Iterative Projections and Rotations,” IEEE Trans. Signal Process., vol. 61, no. 8, pp. 2065, Apr. 2013. • Y. Zhang, “Theory of Compressive Sensing via L1 Minimization: A Non-Rip Analysis and Extensions,” Rice University, Houston, TX, Tech. Rep., 2008. • I. Ram, M. Elad, and I. Cohen, “The RTBWT Frame – Theory and Use for Images”, working draft to be submitted soon. References Papers • Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented Lagrange multiplier method for exact recovery of a corrupted low-rank matrices. Mathematical Programming, submitted, 2009. • Donoho, D.L.: Compressed Sensing. IEEE Trans. Info. Theory 52(4) (2006) 1289–1306 • I. Ram, M. Elad, and I. Cohen, “Image Processing using Smooth Ordering of its Patches”, to appear in IEEE Transactions on Image Processing. • M. Elad, “Sparse and Redundant Representation Modeling — What Next?”, IEEE Signal Processing Letters, Vol. 19, No. 12, Pages 922-928, December 2012. • A.M. Bruckstein, D.L. Donoho, and M. Elad, “From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images”, SIAM Review, Vol. 51, No. 1, Pages 34-81, February 2009. • Candès E., X. Li, Y. Ma, and J. Wright, “Robust Principal Component Analysis”, Journal of the ACM, volume 58, no. 3, May 2011. References Resources • Compressed Sensing Audio Demonstration: http://sunbeam.ece.wisc.edu/csaudio/ • SMALLbox: https://code.soundsoftware.ac.uk/projects/smallbox • Compressed Sensing Video Lectures – Low-rank modeling – Matrix Completion via Convex Optimization: Theory and Algorithms – An Overview of Compressed Sensing and Sparse Signal Recovery via L1 minimization – L1 Minimization – Basics of probability and statistics for Machine Learning http://videolectures.net/mlss2011_candes_lowrank/ http://videolectures.net/mlss09us_candes_mccota/ http://videolectures.net/mlss09us_candes_ocsssrl1m/ http://videolectures.net/nips09_bach_smm/ http://videolectures.net/bootcamp07_keller_bss/ • Least Squares Estimates: http://www.khanacademy.org • Compressive Sensing Resources: http://dsp.rice.edu/cs • TFOCS Toolbox: http://cvxr.com/tfocs/ • Computer Vision System Toolbox from MATLAB http://www.mathworks.com/products/computer-vision/