Binocular disparity and Stereopsis Bruce Cumming Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health Put red lens over left eye, blue lens over right eye Stereo anaglyph by Prof. Michael Greenhalgh, Australian National University (with permission). stereopsis L R correspondence problem left eye’s image right eye’s image random-dot patterns • • • • a completely unnatural stimulus image changes every few ms no recognisable objects e.g. faces each dot has dozens of identical potential matches • and yet a clear perception of depth! Neurons and depth perception • A simple model to generate disparity signals. • How neurons reflect this. • Some psychophysical limits this explains. • Further processing. head image from Royal Holloway University of London Vision Research Group (with permission) Left retina Right retina Receptive Field Fovea Disparity-selective neuron Right RF * Left RF * R L basic building-block • inner product of image with receptive field v dxdy. x, y I x, y Pos(v) …. -0.1 + 1 + 1 ….S = response Left RF =l out (l r )2 + + Right RF =r S (l r )2 l 2 r 2 2lr l1 r1 l2 r2 l2 r1 Input (membrane V) Circuitry for complex cell left RF1 RF2 right binocular simple cells BS 1 BS 2 complex cell Cx BS 3 If RF2 = -RF 1 in both eyes, then half squaring then summing is equivalent to simply squaring. BS 4 energy model square the result n C vLj vRj 2 j 1 sum over many such subunits convolution of left convolution of right eye’s image with jth eye’s image with jth left receptive field right receptive field add together R L R Right Stimulus Position L Complex cell Model Left Stimulus Position Ohzawa et al, 1990 Disparity-selective neuron Right RF R Left RF L R L R L R Right Stimulus Position L L R Complex cell Model Left Stimulus Position Ohzawa et al 1990 Disparity-selective neuron Right RF * Left RF * R L Left RF -d Right RF d 0 Correlation 1 0.5 0 -0.5 -1 -50 0 Disparity (pixels) 50 1 Patern 1 Patern 2 Correlation 0.5 Patern 3 Patern 4 0 Patern 5 Mean -0.5 -1 -50 0 Disparity (pixels) 50 -d 0 Disparity d Left RF Right RF Correlation 1 0.5 0 -0.5 -1 0 Disparity DeAngelis, Ohzawa and Freeman, (1991) Cat simple cell RF maps For single subunits (simple) • Odd symmetric disparity tuning implies phase disparity • Even symmetry around non-zero disparity implies position disparity True for complex cells if: • All subunits have same phase disparity • All subunits have same position disparity. Monkey complex cells Firing rate (spikes/s) duf043 duf065 60 40 20 0 -1.4 -0.9 -0.4 0.1 0.6 1.1 1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 Disparity (degrees) So far: • Energy model measures crosscorrelation after filtering. • V1 contains a bank of filters measuring these correlations after displacements of both phase and position. Disparity-selective neuron Right RF * Left RF * R L response [spikes/sec] 50 50 the neuronal response cf: 0.06 cpd 80 60 40 20 0 0 0.5 1 1.5 2 0 time [sec] 0 250 5 10 15 200 -2 0 0 response [spikes/sec] 50 100 150 cf: 0.06 cpd 2 20 25 the neuronal response 80 60 40 20 0 0 0.5 1 1.5 2 cf: 0.5 cpd 80 60 40 20 0 0 0.5 1 1.5 2 time [sec] relative modulation 1 response [spikes/sec] relative modulation cf: 0.06 cpd 0.5 0 0.1 1 10 corrugation-frequency [cpd] 80 f1 60 40 f0 20 0 0 0.5 1 1.5 2 cf: 0.5 cpd 80 60 40 20 0 0 0.5 1 1.5 2 time [sec] -2 0 2 0 50 100 150 200 0 250 5 10 15 20 25 corrugation cutoff [cpd] output exponent: 1 2 2 1.5 1 0.5 0 n=19 r=0.45 0 0.5 1 1.5 2 1/(2*π*SD of RF height) [degree-1] 4 Predicted from mean V1 response (mean ecentricity 3.7º) Temporal impulse response (LGN) 10ms Reppas, Usrey and Ried (2000) Temporal frequency tuning for contrast and disparity 1.5 1 40 0.5 0 1 10 0 100 temporal frequency [Hz] tf cutoff for drifting luminance grating [Hz] drifting luminance grating 80 response [spikes/sec] relative modulation [f1/f0] disparity modulation 40 20 n=27 0 0 20 40 tf cutoff for disparity modulation [Hz] Summary • We don’t solve the correspondence problem dot-by-dot. • Is this enough? × = Monocular response Correlation 1 0.5 0 -0.5 -1 -50 0 50 RF Disparity (pixels) Correlation 1 0.5 0 -0.5 -1 -50 0 RF Disparity (pixels) 50 Disparity is two-dimensional P’ P direction of gaze fovea nodal point Epipolar line Y Z X Y Z X Y Z X probability density function for disparities encountered during natural viewing vertical disparity (degrees) -15 -10 -5 0 5 10 15 -10 0 10 20 horizontal disparity (degrees) 30 probability density function for disparities encountered during natural viewing vertical disparity (degrees) -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 horizontal disparity (degrees) 1 -6 vertical disparity -4 -2 0 2 4 6 -15 -10 -5 0 5 10 15 horizontal disparity 20 25 30 Preferred 2-D Disparity 0.6 Vertical Disparity (degrees) 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 Horizontal Disparity (degrees) 0.6 Stevenson and Schor (1997) Schreiber et at (2001) Searching for Matches • Not a 2-D problem. • Vertical extent of RF may be enough to deal with most epipolar lines. correlation 1 0.5 0 -0.5 -1 -50 0 RF Disparity (pixels) 50 correlation 1 0.5 0 -0.5 -1 -50 0 RF Disparity (pixels) 50 Size-disparity correlation Spatial Period of Center Frequency Disparity range (min) 1/(threshold contrast) Smallman and MaCleod (1994) Binocular phase Center spatial frequency Size-disparity correlation (2) Prince and Eagle (1999) R L Stimulus Disparity correlation 1 0.5 0 -0.5 -1 -50 0 RF Disparity (pixels) 50 Stimulus Disparity Stimulus Disparity 1 correlation .5 0 -.5 -1 -50 0 50 RF Disparity (pixels) 0 50 Threshold 2 cpd 0.5 cpd 2 cpd + 0.5 cpd 2 cpd, half cycle Farell, Li and McKee (2004) correlation 1 0.5 0 -0.5 -1 -50 0 Disparity (pixels) 50 Tsai and Victor (2003) anti-correlated stimuli left eye’s image right eye’s image black white 1 correlation .5 0 -.5 -1 -50 0 50 RF Disparity (pixels) 0 50 energy model simulation simulated firing rate 4 3.5 3 correlated stimuli 2.5 2 1.5 anti-correlated stimuli 1 0.5 0 -50 -40 -30 -20 -10 0 disparity 10 20 30 40 50 Firing Rate (spikes/s ) 120 200 Cell rb332 180 160 140 120 100 80 60 40 20 Cell rb313 100 80 60 40 20 0 0 -0.4 -0.2 0.0 0.2 0.4 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 Disparity (degrees) Correlated disparity Anticorrelated disparity Energy model prediction 120 Firing Rate (spikes/s ) 100 80 60 40 20 0 -0.4 -0.2 0.0 Correlated disparity Anticorrelated disparity 0.2 0.4 120 Cell rb313 Firing Rate (spikes/s ) 100 80 60 weaker response for anti-correlated stimuli 40 20 0 -0.4 -0.2 0.0 Correlated disparity Anticorrelated disparity 0.2 0.4 what the energy model gets wrong quantitative response to anticorrelation – real cells respond more weakly to anticorrelated stimuli 35 firing rate (spikes / s) 30 25 20 15 10 5 0 monocular stimuli left right 35 firing rate (spikes / s) 30 25 20 15 10 5 0 “this cell is monocular” left right disparity tuning curve 35 left right left right left right firing rate (spikes / s) 30 25 20 15 10 5 0 -1.5 -1 -0.5 0 0.5 disparity (degrees) 1 left eye has purely inhibitory effect 35 firing rate (spikes / s) 30 25 20 - 15 10 5 0 -1.5 -1 -0.5 0 0.5 disparity (degrees) 1 Disparity Discrimination Index (DDI) 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 Ocular Dominance Index 1 but -! • this isn’t possible in the energy model. the energy model says that each eye sends both excitatory and inhibitory input receptive fields =ON region of RF =OFF region of RF BS the energy model says that each eye sends both excitatory and inhibitory input receptive fields =ON region of RF =OFF region of RF BS disparity tuning curve 35 left right Inhibition from left eye 25 20 left right left right 15 uncorrelated firing rate (spikes / s) 30 10 5 0 -1.5 -1 -0.5 0 0.5 disparity (degrees) 1 Response rates to random dots 140 Ideal monocular neuron monocular 120 100 80 Ideal binocular neuron 60 40 20 0 0 20 40 60 80 100 binocular uncorrelated 120 what the energy model gets wrong quantitative response to anticorrelation – real cells respond more weakly to anticorrelated stimuli cells where one eye always inhibits firing – not possible within the energy model energy model: • disparity tuning curve is the crosscorrelation of the left and right eye’s receptive fields. C = [vL+vR]2 = vL2 + vR2 + 2 vLvR D = 2 L * R left eye’s receptive field right eye’s receptive field 0.6 -0.6 0.4 -0.4 0.2 -0.2 0.35 0 0 -0.2 0.2 -0.4 0.4 -0.6 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -0.6 0.6 -0.4 -0.2 0 disparity tuning curve 2 1.5 0.35 1 0.5 0 -0.5 -0.25 0 0.25 0.5 0.2 0.4 0.6 how to test • measure receptive fields? S Cx S S how to test • measure receptive fields? • not possible for complex cells. • make the comparison in Fourier space. • this works for simple and complex cells. S Cx S S energy model: • disparity tuning curve is the cross-correlation of the left and right eye’s receptive fields: D = 2 L* R • the Fourier power spectrum of the disparity tuning curve is the product of the Fourier amplitude spectra of the left and right eye’s receptive fields: FT2(D) = 2 FT(L)FT(R) spatial frequency tuning curve receptive field (RF) 0.6 firing rate 0.4 0.2 0 -0.2 -0.4 0 0 -0.4 -0.2 0 0.2 0.4 Fourier transform 1 0 -1 -2 -0.4 -0.2 0 0.2 position 0.4 6 8 Fourier spectrum -4 2 -0.6 4 spatial frequency 0.6 RF cross-section x 10 2 0.6 amplitude -0.6 -0.6 0 0 2 4 6 spatial frequency 8 if the energy model is right: • then by obtaining the cell’s spatial frequency tuning…. • we obtain the Fourier amplitude spectrum of the RF profile. 0.6 firing rate 0.4 0.2 0 -0.2 -0.4 -0.6 -0.6 0 2 4 6 spatial frequency 8 -0.4 -0.2 0 0.2 0.4 0.6 monocular spatial frequency tuning curves left eye right eye 2 2.5 2 1.5 x 1 0.5 00 1.5 1 0.5 2 4 6 00 8 spatial frequency 2 4 6 8 spatial frequency Fourier spectrum of disparity tuning curve 4 3 = 2 1 00 2 4 6 spatial frequency 8 spatial frequency tuning left eye right eye firing rate (spikes/s) 60 50 40 30 20 10 0 0.1 1 10 spatial frequency (cycles per degree) 0 0.1 1 10 spatial frequency (cycles per degree) product of fitted monocular spatial frequency tuning curves 60 0.05 50 0.04 40 30 0.03 20 0.02 10 00 0.01 0.1 0.2 0.5 1 2.5 5 10 15 spatial frequency (cycles per degree) firing rate (spikes/s) disparity tuning curve 100 90 80 70 60 50 40 30 20 10 0 -1.5 -1 -0.5 0 0.5 disparity (degrees) 1 1.5 0.05 firing rate (spikes/s) firing rate (spikes/s) 60 50 0.04 40 30 0.03 20 0.02 10 0 0.1 1 10 100 90 80 70 60 50 40 30 20 10 0 -1.5 normalized units spatial frequency (cycles/degree) -1 -0.5 0 0.5 disparity (degrees) 0.05 0.04 0.03 0.02 0.01 0 0.02 0.05 0.1 0.2 1 0.5 1 2.5 5 spatial frequency (cycles per degree) 10 15 1.5 Peak frequencies differ normalized units 0.05 product of fitted spatial frequency tuning curves 0.04 Fourier transform of fitted disparity tuning curve (minus baseline) 0.03 0.02 0.01 0 0.01 0.02 0.05 0.1 0.2 0.5 1 too much power at DC 2.5 5 10 15 Firing rate (spikes/s) 40 duf065 Right Left 35 60 30 25 40 20 15 20 10 5.0 0 -1.4 -0.9 -0.4 0.1 0.6 Disparity (degrees) 1.1 1.4 0.0 0.05 0.10 1.0 10 Spatial Frequency (cpd) 30 Firing rate (spikes/s) duf043 50 70 60 40 Right Left 40 20 20 0 0 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 Disparity (degrees) 0.05 0.10 1.0 10 Spatial Frequency (cpd) 30 what the energy model gets wrong quantitative response to anticorrelation – real cells respond more weakly than predicted to anticorrelated stimuli suppressive effect from one eye – not possible within the energy model mismatch between disparity frequency and response to gratings – real disparity tuning curves have more power at low frequencies than predicted how can we fix the problem? • one simple modification to the energy model. • keeps all the successes of the energy model. • but fixes all these problems at a stroke! energy model n C vLj vRj 2 j 1 images receptive fields binocular simple cell BS disparity-selective complex cell Cx Read’s modified version C PosvLj PosvRj n j 1 images receptive fields monocular simple cells 2 = half-wave rectification binocular simple cell disparity-selective complex cell MS BS MS Cx Read’s modified version C PosvLj PosvRj n 2 j 1 images receptive fields monocular simple cells binocular simple cell disparity-selective complex cell MS BS MS Cx suppression from one eye C PosvLj PosvRj images receptive fields monocular simple cells 2 binocular simple cell disparity-selective complex cell MS BS Cx MS input purely inhibitory cell never fires problems our model solves • suppressive effect from one eye – inhibitory synapse after monocular simple cell firing rate (spikes/s) 0.05 50 0.04 40 30 0.03 20 0.02 10 0 0.1 1 10 100 90 80 70 60 50 40 30 20 10 0 -1.5 spatial frequency (cycles/degree) normalized units firing rate (spikes/s) 60 -1 -0.5 0 0.5 disparity (degrees) 0.05 0.04 0.03 0.02 0.01 0 0.02 0.05 0.1 0.2 1 0.5 1 2.5 5 spatial frequency (cycles per degree) 10 15 1.5 firing rate (spikes/s) 0.05 50 0.04 40 30 0.03 20 0.02 10 0 0.1 1 10 100 90 80 70 60 50 40 30 20 10 0 -1.5 spatial frequency (cycles/degree) normalized units firing rate (spikes/s) 60 -1 -0.5 0 0.5 disparity (degrees) 0.05 0.04 0.03 0.02 0.01 0 0.02 0.05 0.1 0.2 1 0.5 1 2.5 5 spatial frequency (cycles per degree) 10 15 1.5 C vL vR 2 v v 2vL vR 2 L 2 R when vL and vR are negatively correlated, this tends to be negative 100 firing rate (spikes/s) 90 80 70 pulling the response down below the uncorrelated level 60 50 40 30 20 10 0 -1.5 -1 -0.5 0 0.5 1 1.5 C PosvL PosvR 2 PosvL PosvR 2PosvL PosvR 2 2 when vL and vR are negatively correlated, this is zero 100 firing rate (spikes/s) 90 80 70 pushing the response up closer to the uncorrelated level 60 50 40 30 20 10 0 -1.5 -1 -0.5 0 0.5 1 1.5 disparity tuning curve energy model 0 -50 0 our modified version 0 50 -50 Fourier power spectrum disparity 0 50 disparity no power at DC 0 0 0.02 0.04 0.06 spatial frequency increased power at DC 0 0 0.02 0.04 0.06 spatial frequency threshold at zero monocular simple cells receptive fields binocular simple cell complex cell MS BS MS Cx increased threshold monocular simple cells receptive fields binocular simple cell complex cell MS BS MS Cx energy model our modified version disparity tuning curve zero threshold 0 -50 0 50 0 -50 Fourier power spectrum disparity 0.02 0.04 0.06 spatial frequency 50 0 -50 disparity no power at DC 0 0 0 high threshold 0.02 0.04 50 disparity increased power at DC 0 0 0 0.06 spatial frequency maximum power at DC 0 0 0.02 0.04 0.06 spatial frequency problems our model solves • suppressive effect from one eye • mismatch between disparity frequency and response to gratings – inhibitory synapse after monocular simple cell – threshold boosts power at low frequencies anticorrelation C v v 2vL vR 2 L 2 R image in one eye replaced with negative one of the convolutions changes sign disparity-modulated term inverts; amplitude unchanged: C v v 2v v 2 2 L R L R a consequence of the linearity of the model modified model C PosvL PosvR 2PosvL PosvR 2 2 anticorrelation: convolution changes sign C PosvL PosvR 2PosvL Pos vR 2 2 clearly disparity-modulated term no longer simply inverts MS Energy model Modified model 0 -40 -20 0 20 40 Disparity -40 -20 0 20 40 Disparity MS Response 2 1.5 1 0.5 -40-20 0 20 40 Disparity problems our model solves • suppressive effect from one eye – inhibitory synapse after monocular simple cell • mismatch between disparity frequency and response to gratings – threshold boosts power at low frequencies • quantitative response to anticorrelation – with high enough thresholds, arbitrarily low amplitude ratios can be obtained heterogeneity • real neurons vary greatly in behavior. • some well-described by energy model. • complex cells have many binocular subunits: • perhaps some are like the energy model – linear binocular combination • others are like our modified version – threshold prior to binocular combination heterogeneity some binocular subunits as in our model… MS MS BS Cx …others as in the original energy model BS complex cells receive input from many binocular subunits. summary • the energy model gives a good qualitative account of disparity-tuned neurons. • it has been widely used in computational models. • there are a number of discrepancies when it is compared with quantitative data. summary • A simple, plausible modification removes these discrepancies. • Consequences for models of later processing largely unexplored. Extrastriate cortex • V2, V4, MT, and MST all show responses to anticorrelated RDS, like V1. • IT does not response to anticorrelated RDS. • Does the solution have to be represented explicitly? conclusion • Good understanding of the mechanisms of disparity selectivity in primary visual cortex, without invoking complex network interactions. • provides a firm basis for understanding the computations enabling stereo vision. Put red lens over left eye, blue lens over right eye Stereo anaglyph by Prof. Michael Greenhalgh, Australian National University (with permission). plus… a prediction • Consider case where convolutions are equal and opposite: vL=-vR • Original energy model: they cancel out C vL vR 0 2 • Our version: no cancellation C PosvL PosvR PosvL 2 2 disparate drifting grating right eye left eye typical simple cell response firing rate • one burst of firing per cycle of the stimulus. time (one stimulus cycle) phase difference 0o right eye MS BS left eye MS phase difference 0o …half a cycle later right eye MS BS left eye MS phase difference 180o right eye MS – + BS MS left eye phase difference 180o …half a cycle later right eye MS + BS left eye MS – 60o 120o 180o 240o 300o interocular phase difference 0o energy model modified version time (2 stimulus periods) modified version 60o 120o 180o 240o 300o interocular phase difference 0o energy model time (2 stimulus periods) hg226.0 40 10 20 20 10 5 0 0 0 80 60 40 20 30 10 1 0 0 0 80 60 40 20 0 20 3 20 100 50 2 10 0 0.50 90o 60 40 60 40 40 20 20 0 0 20 0 8 40 6 20 20 0 spikes / s -1.00o -180 o -0.50 -90 0.00 0o hg212.0 0 1.00 180o interocular phase difference hg136.0 4 10 0.2 0.4 0.6 0.8 1 0 2 0.2 0.4 0.6 0.8 1 time (1 stimulus period) 0 0.2 0.4 0.6 0.8 1 summary • we postulate that some binocular simple cells receive input via monocular simple cells. • straightforward, physiologically plausible mechanism. • extends our repertoire so that we can account for all known observations. • even predicted something before it was observed! Stereo anaglyph by Michael Greenhalgh, Australian National University. Put red lens over left eye, blue lens over right eye long-term goal of our work to understand: • the algorithm the brain uses for stereoscopic depth perception. • how this algorithm is implemented physiologically. • where this occurs within the brain. The stereo correspondence problem 1 Patern 1 Patern 2 Correlation 0.5 Patern 3 Patern 4 0 Patern 5 Mean -0.5 -1 -50 0 Disparity (pixels) 50 Right RF Left RF -90 90 0 Correlation 1 0.5 0 -0.5 -1 -90 0 Disparity 90 Y “straight ahead” yL X “straight ahead” yR xL L Z xR R 1 Patern 1 Patern 2 Correlation 0.5 Patern 3 Patern 4 0 Patern 5 Mean -0.5 -1 -50 0 Disparity (pixels) 50 basic building-block • inner product of image with receptive field v dxdy. x, y I x, y “ON” region Pos(v) basic building-block • inner product of image with receptive field v dxdy. x, y I x, y “OFF” region Pos(v) Left RF 0 Right RF d simulated firing rate energy model simulation correlated stimuli uncorrelated stimuli -50 -40 -30 -20 -10 0 disparity 10 20 30 40 50 shape of disparity tuning • a key prediction of the energy model. • demonstrating this result would be strong evidence for the energy model. Preferred Grating Frequency (cpd) 5.0 4.0 3.0 2.0 1.0 0.0 0.0 0.5 1.0 1.5 2.0 2.5 Disparity Frequency (cpd) Preferred Grating Frequency (cpd) 5.0 4.0 3.0 2.0 1.0 0.0 0.0 0.5 1.0 1.5 2.0 2.5 Disparity Frequency (cpd) 0.05 normalized units Peak 0.04 0.03 0.02 0.01 low 0 0.01 0.02 0.05 0.1 0.2 0.5 1 2.5 5 10 15 Response at 0.05cpd Monocular grating Peak frequency Response at peak 5.0 1.0 4.0 0.8 3.0 0.6 2.0 0.4 1.0 0.2 0.0 0.0 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 FT of disparity tuning 0.6 0.8 1.0 70 firing rate (spikes/s) firing rate (spikes/s) 45 40 35 30 25 20 15 10 5 0 0.1 1 60 50 40 30 20 10 0 10 normalized units spatial frequency (cycles/degree) -1.5 -1 -0.5 0 0.5 disparity (degrees) 3 2.5 2 1.5 1 0.5 0 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 spatial frequency (cycles per degree) 4 1 1 correlation .5 0 -.5 -1 -50 0 50 RF Disparity (pixels) 0 50