Foundations & Core in Computer Vision: A System Perspective Ce Liu Microsoft Research New England Vision vs. Learning • Computer vision: visual application of machine learning? • Data features algorithms data • ML: design algorithms given input and output data • CV: find the best input and output data given available algorithms Theoretical vs. Experimental • Theoretical analysis of a visual system – Best & worst cases – Average performance • Theoretical analysis is challenging as many visual distributions are hard to model (signal processing: 2nd order processes, machine learning: exponential families) • Experimental approach: full spectrum of system performance as a function of the amount of data, annotation, number of categories, noise, and other conditions Quality vs. Speed • HD videos, billions of images to index • Real time & 90% vs. one hour per frame & 95%? • Mechanism to balance quality and speed in modeling Automatic vs. semi-automatic • Common review feedback: parameters are hand-tuned; not clear how to set the parameters • Vision system user feedback: I don’t know how to tweak parameters! • Computer-oriented vs. human-oriented representations • Human-in-the-loop (collaborative) vision – How to optimally use humans (what, which and how accurate) beyond traditional active learning – Model design by crowd-sourcing – Learning by subtraction Algorithms vs. Sensors • Two approaches to solving a vision problem – Look at images, design algorithms, experiment, improve… – Look at cameras, design new/better sensors, … • Cameras for full-spectrum, high res, low noise, depth, motion, occluding boundary, object, … • What’s the optimal sensor/device for solving a vision problem? • What’s the limit of sensors? Thank you! Ce Liu Microsoft Research New England