COGS 300 Notes January 9, 2014 Research Methodology Artificial Intelligence Artificial Intelligence. . . . . . is the intelligent connection of perception to action This definition of artificial intelligence is delightfully circular since the word “intelligence” appears twice. But, this is OK. Biology is the study of living things. Biology advances even though there never has been (and probably never will be) an accepted medical, legal and ethical definition of what it means for something to be alive. Consider the example of Helen Keller (1880–1968). In 1882, at 19 months of age, Keller caught a fever that was so fierce she nearly died. She survived but the fever left its mark — she could no longer see or hear. Because she could not hear she also found it very difficult to speak. Her teacher was Anne Sullivan. Keller relied a great deal on Anne Sullivan, who accompanied her everywhere for almost fifty years. Without her faithful teacher Keller would probably have remained trapped within an isolated and confused world. Given her life accomplishments, no one would suggest that Helen Keller was mentally disabled. But, her perceptual disabilities made it very difficult for her to learn. That she did learn is a great credit to her and to Anne Sullivan. It is fair to say that someone like Helen Keller is blind (i.e., unable to see), deaf (i.e., unable to hear) and mute (i.e., unable to speak). It is not fair to call her dumb. (The word “dumb” has a double meaning in English. According to Webster, it means, “lacking the power of speech” and also “markedly lacking in intelligence: synonym stupid”). Given this, referring to someone who cannot speak as “dumb” now is considered offensive. In the same way, I would argue that referring to machines that cannot see or hear as “dumb” also is offensive. Intelligence requires both cognitive (i.e., mental) and perceptual abilities. There are two main motivations for being interested in artificial intelligence, a scientific one and an engineering one. 1 Artificial Intelligence Artificial Intelligence Understanding Intelligence Making Machines Intelligent Let’s look a bit deeper at computational perception, in this case computational vision. The following diagram helps to identify various sub-fields related to computational vision. Computational Vision: Domains and Mappings 3D WORLD 2D IMAGE PERCEPTION 1 3 2 4 The four numbered arrows identify four sub-areas. Arrow 1: Computer graphics deals with how the 3D world determines the 2D image. For objects of known shape, made of known materials, illuminated and viewed in a known way, computer graphics renders (i.e., synthesizes) the requisite image. Arrow 2: Computer vision deals with the inverse problem, namely given an image what can we say about the shape, and material composition of the objects in view and about the way they are illuminated and viewed. 2 Arrow 3: Is perception of the 3D world direct? Our diagram takes view that visual perception of the 3D world is mediated by the 2D image. One can, of course, generate 2D images artifically that don’t correspond to any possible 3D scene and study how these images are perceived. This, indeed is one of the things that perceptual psychologists do. Sometimes, “seeing is deceiving.” Arrow 4: Different images can produce the identical perception. Examples of perceptual metamers are well known in areas such as lightness, colour, texture and motion. In general, arrow 4 deals with the question, “What is the equivalence class of images that give rise to identical perceptions?” We end our brief discussion of research methodology with a look at five dimensions of R&D (applicable to any field). The distinctions we make are important, especially to a COGS student contemplating his or her future career. R&D Dimensions pure applied theoretical experimental problem−driven technique−driven analytic synthetic science engineering Pendulum Analogy (in Computational Vision) image analysis bottom up statistical methods model free scene analysis top down symbolic methods model based 3 (Prefer) Spiral Analogy knowledge pull time technology push progress Bob’s Research Bob’s research is in “embodied” computer vision. The goal is provide computers with the ability to perceive the 3-D world visually and, in the context of robotics, the ability to act directly in the 3-D world in order to accomplish a given task or tasks. Research Questions For a specified task, the question becomes, “What information about the 3-D world is needed to accomplish the task?” This naturally leads to two further questions: 1. What limits our ability to acquire the needed information? 2. What could we do with the information if we had it? Let’s look at a few research examples. We begin with (early) examples of work done in remote sensing for the St. Mary Lake, BC, forestry test site. Given surface topography in the form of a Digital Terrain Model (DTM), one can produce a shaded relief map for any assumed direction of illumination. The standard used in cartography is illumination from the north west at an elevation of 45 degrees. Question: Can you guess why this illumination direction is “standard?” Example 1 shows the St. Mary Lake test site first under standard cartographic illumination. 4 Example 1: St. Mary Lake NW Next, we show the St. Mary Lake test site rendered with illumination from the north east at an elevation of 45 degrees. Example 1 (cont’d): St. Mary Lake NE Note: In British Columbia, sun illumination never comes from the north west or the north east. We can continue the fiction by producing a colour composite corresponding to a red sun from the north west and a cyan (i.e., blue+green) sun from the north east. 5 Example 1 (cont’d): St. Mary Lake NW (R) NE (BG) NASA’s Landsat satellites are in sun synchronous orbits. Their time of passage over most of British Columbia is approximately 9:30am PST. Thus, the direction of sun illumination is (roughly) constant, 2.5 hours before local solar noon, although the sun elevation does vary considerably between summer and winter. Example 2 shows how comparison of images acquired at different times can be used for change detection. First, here is an actual image of St. Mary Lake acquired in September, 1974. The illumination direction is approximately from the south east. Note: This illumination “from below” can result in a reversed perception of mountains and valleys. 6 Example 2: St. Mary Lake 1974 Now, here is an actual image of St. Mary Lake acquired in September, 1979. Again, the illumination direction is approximately from the south east. Example 2 (cont’d): St. Mary Lake 1979 Change becomes apparent when we combine the two images into a colour composite. 7 Example 2 (cont’d): St. Mary Lake 1974(BG) 1979(R) Fast forward now to recent work on sports broadcast video analysis. Example 3: Okuma, Lu and Gupta Here is a recent sample video sequence processed based on the combined thesis work of three LCI graduate students: Video: 1000 frame broadcast hockey sequence Credit: Kenji Okuma, Wei-Lwun Lu, Ankur Gupta http://www.cs.ubc.ca/~little/mcGill_1k.mpg Example 3 is a 1000 frame hockey broadcast video sequence processed using our (December, 2010) automated system. Ankur Gupta’s tool automatically registers each video frame to the (geometric) rink model. Kenji Okuma and Wei-Lwun Lu have an automatic tracker to detect players and to connect these detections into tracks. The visualization tool maps the detected tracks onto the rink model shown in the upper left. Each player is a circle with a small track of the last n frames behind it. The model of the rink also is shown in red and in red circles re-projected onto each video frame. 8 Tsinko 2010 Egor Tsinko’s 2010 UBC M.Sc thesis, “Background Subtraction with Pan/Tilt Camera” extended three standard background subtraction methods to accommodate a moving, pan/tilt camera. Example 3: Background Subtraction with Pan/Tilt Camera Egor Tsinko’s 2010 UBC M.Sc thesis explored background subtraction with a (moving) pan/tilt camera. See the web site http://www.cs.ubc.ca/nest/lci/thesis/etsinko/ for an on-line (PDF) copy of the thesis as well as for links demonstrating results applied to six different video sequences. (Details are described in the thesis). One test sequence is a hockey sequence we have used in other work http://www.cs.ubc.ca/nest/lci/thesis/etsinko/seq5_mog_image.mpg Pixels tinted red are foreground. These correspond to moving objects and to newly seen parts of the scene not yet determined to be stationary and therefore part of the background Duan 2011 Example 4: Puck Location and Possession Andrew Duan’s M.Sc thesis (August, 2011) integrates the determination of puck location and possession into our sports video analysis system Here’s what Andrew’s system does with the same hockey video sequence we saw before: Video: 1000 frame broadcast hockey sequence Credit: Xin Duan (Andrew) http://www.cs.ubc.ca/~duanx/files/PuckTracking_Possession.wmv 9 Chen 2012 Picasso’s “Light Drawings” Photo credit: Gjon Mili – Time & Life Pictures/Getty Images, 1949 Photographer Gjon Mili visited Pablo Picasso in the South of France in 1949. “Picasso gave Mili 15 minutes to try one experiment,” LIFE magazine wrote in its January 30, 1950, issue in which the centaur image above first appeared. Picasso was so fascinated by the results that he posed for five sessions. The photographs, known as Picasso’s “light drawings,” were made with a small electric light in a darkened room. Read more at http://life.time.com/culture/picasso-drawing-with-light/#ixzz1ncwBwLex Optical Motion Capture (Re-think Trade-offs) Commercial optical motion capture systems: • high speed, high resolution cameras • short exposure times • accurate shutter synchronization Chen 2012, UBC Computer Science M.Sc thesis: • inexpensive, consumer-grade cameras • standard video rates 10 • long (i.e., constant) exposure times • unsynchronized cameras Marker positions become motion streaks (which simplifies the tracking problem even for fast object motion) We are able to exploit the lack of camera synchronization to achieve temporal super-resolution. Motion Streaks One frame (left) and three successive frames (right), shown as RGB, from a spinning fan Figure credit: Xing Chen Demo: a 7-camera setup in ICCS X209 Figure credit: Xing Chen 11 Demo: Wired and Ready A simplified human skeleton with 15 landmark points Figure credit: Xing Chen Demo: Human Motion Capture • video: Arm motion • video: Jumping in place — long streaks occur in this video • video: Marching forward and back These videos play first at normal speed then at 1/4 their original speed. See the web page http://www.cs.ubc.ca/~woodham/ccd2013/ for links to an on-line (PDF) copy of Chen’s M.Sc thesis as well links to a poster and supplementary videos presented at the 2nd IEEE International Workshop on Computational Cameras and Displays (CCD 2013), June 28, 2013 Video credits: Xing Chen 12