Learning Brightness Transfer Functions for the Joint Recovery of

advertisement

Learning Brightness Transfer Functions for the Joint

Recovery of Illumination Changes and Optical Flow

Oliver Demetz

1

,

Michael Stoll

2

, Sebastian Volz

2

, Joachim Weickert

1

, Andr´es Bruhn

2

2

1

Mathematical Analysis Group, Saarland University, Saarbr¨ucken, Germany

Institute for Visualization and Interactive Systems, University of Stuttgart, Germany

{ demetz,weickert } @mia.uni-saarland.de

{ stoll,volz,bruhn } @vis.uni-stuttgart.de

Abstract.

The increasing importance of outdoor applications such as driver assistance systems or video surveillance tasks has recently triggered the development of optical flow methods that aim at performing robustly under uncontrolled illumination. Most of these methods are based on patch-based features such as the normalized cross correlation, the census transform or the rank transform.

They achieve their robustness by locally discarding both absolute brightness and contrast. In this paper, we follow an alternative strategy: Instead of discarding potentially important image information, we propose a novel variational model that jointly estimates both illumination changes and optical flow. The key idea is to parametrize the illumination changes in terms of basis functions that are learned from training data. While such basis functions allow for a meaningful representation of illumination effects, they also help to distinguish real illumination changes from motion-induced brightness variations if supplemented by additional smoothness constraints. Experiments on the KITTI benchmark show the clear benefits of our approach. They do not only demonstrate that it is possible to obtain meaningful basis functions, they also show state-of-the-art results for robust optical flow estimation.

1 Introduction

Three decades after the seminal work of Horn and Schunck [22], dense variational optical flow methods have found their way into numerous real-world applications such as driver assistance systems [33], markerless motion capture [13], long-term trajectory analysis [32] as well as motion-aware video editing [37]. Based on the minimization of a global energy functional that combines constancy assumptions (data term) with regularity constraints (smoothness term), variational methods allow both a transparent modeling and an accurate estimation of the results. Since many real-world applications require to process outdoor sequences, it is not surprising that the robustness of optical flow methods under uncontrolled illumination has become a major challenge. This is also reflected in the design of recent real-world benchmarks such as the KITTI Vision

Benchmark Suite [14]. It provides challenging data from automotive scenarios that contains typical illumination changes due to automatic camera re-adjustments, changeable weather conditions or physical effects such as shadows and highlights.

Authors have equally contributed to this work.

2 Demetz, Stoll, Volz, Weickert, and Bruhn

Basic Optical Flow Approaches.

In order to tackle the problem of illumination changes, most approaches from the literature make use of constancy assumptions based on illumination-invariant image features. Such features achieve their robustness by locally discarding illumination-sensitive information such as absolute brightness or contrast

[24, 27, 34, 36, 50]. In the extreme case almost all information is discarded and only a relative local ordering is stored [5, 11, 35]. Methods based on such illumination-invariant features use constancy assumptions on higher order derivatives such as the gradient or the Hessian [23, 34, 38, 43, 51], photometric invariants [27, 45, 54] as well as mutual information [21]. Moreover, recently, patch-based techniques have become very popular such as the normalized cross correlation (NCC) [50], the rank transform [11, 53] and the census transform [5, 35, 39]. Some of the approaches also use illumination-robust descriptors from sparse feature matching such as SIFT [24, 25] and HOG [8, 36]. A comparison of some of these methods can be found in [40] and [46]. Similar in spirit are methods that discard illumination-relevant information via preprocessing. Typically, they employ the structure-texture-decomposition [48] or derivative-type filters [41].

What all those aforementioned methods have in common is that they discard potentially valuable image information. However, if illumination changes are only moderate or not even present, discarding brightness and contrast information may significantly deteriorate the results. Moreover, some of the transformations are highly non-linear and even lack differentiability which results in a more complex optimization. Finally, most invariants are not defined at all locations, since illumination-invariant information cannot be extracted everywhere (e.g. in homogeneous regions). Summarizing: Instead of discarding potentially valuable information that may harm the estimation, it would be desirable to keep and exploit all available information when estimating the flow.

Advanced Optical Flow Approaches.

In fact, there are a few methods in the literature that follow the above mentioned idea by jointly estimating both illumination changes and the optical flow. On the one hand, there are approaches that seek to estimate a single global brightness transfer function to identify problematic image regions [10].

On the other hand, there are techniques that embed the classical brightness constancy assumption into a parametrized local illumination model for which the coefficients are jointly estimated. Such local models include simple additive terms [7, 28], affine illumination models [15, 18, 30] as well as complex brightness models derived from physics

[20]. Recently, also local and global ideas were combined [18]: While a local affine model allows to estimate the correspondences in a PatchMatch-like approach [1], the information is eventually condensed to a single global transfer function.

Preserving potentially valuable image information, however, is not the only advantage of approaches based on parametrized illumination models. If appropriate models are employed, imposing smoothness on the resulting parameter field allows to separate real illumination changes from motion-induced brightness variations. However, so far in the literature, the considered illumination models were either chosen ad-hoc [7, 15], or specifically tailored towards a certain physical process [20]. There have been no efforts so far to determine the most suitable model for a specific type of data. Moreover, it has not yet been investigated how the smoothness term of the coefficient field should be modeled such that it allows for a good separation of motion and illumination effects.

Finally, since all existing variational methods with parametrized illumination models

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow 3 are based on simple concepts for data and smoothness terms, it remains unclear how a more sophisticated joint method would perform on a suitable optical flow benchmark.

Our Contribution.

In this paper, we address all these questions. Firstly, we use a principal components analysis (PCA)-based approach with a clustering step to learn a suitable basis for the local brightness transfer functions from training data. Secondly, we propose a joint complementary regularizer for the basis coefficients that is based on a weighting scheme derived from the eigenvalues of the PCA. Thirdly, we embed the basis functions and the regularizer into a variational model that combines brightness and gradient constancy with a second-order smoothness term. Experiments demonstrate that our approach works very well in practice. We obtain meaningful basis functions, intuitive coefficient fields as well as state-of-the art results on the KITTI benchmark.

Related Work on Basis Learning.

Apart from the aforementioned techniques that jointly estimate illumination changes and optical flow, a few more related works are worth mentioning. On the one hand, there are methods that address the estimation of camera response or brightness transfer functions, mainly in the context of HDR imaging. Such approaches include the work of Grossberg and Nayar [16] who proposed to compute the brightness transfer function via histogram specification, as well as the papers of Debevec and Malik [9] and Grossberg and Nayar [17] on estimating the camera response function, the latter one using learned basis functions. On the other hand, there are approaches that represent appearance changes with basis functions for illumination changes. Such methods include the template tracking approach of Hager and Belhumeur

[19] as well as the work on iconic changes by Black et al.

[3]. Finally, there exist a few optical flow methods that make use of spatial or temporal basis functions to model the flow. This applies to the approaches by Nir et al.

[31] on over-parametrized optical flow and Garg et al.

on temporal tracking of non-rigid objects with subspace constraints [12].

Organization.

In Section 2, we introduce our novel variational model based on brightness transfer basis functions and joint complementary coefficient regularization. Minimization issues are then discussed in Section 3. The estimation of the basis functions and the clustering step are explained in Section 4. Our results and a comparison to the literature are presented in Section 5. The paper ends with a summary in Section 6.

2 Variational Model

Let us consider a sequence of two images f i rectangular domain Ω ⊂

R

2

: Ω →

R

( i ∈ { 1 , 2 } ) defined on a

. Furthermore, let the optical flow field be denoted by w = ( u, v )

>

: Ω →

R

2 and let the illumination changes be parametrized by a coefficient field c : Ω →

R n

. Then, inspired by the basic approach of Cornelius and Kanade [7], we propose to jointly compute the optical flow and the illumination changes as minimizer of an energy functional with the following structure:

E ( w , c ) =

Z

D ( w , c ) + R flow

( w ) + R illum

( c ) d x .

(1)

It consists of three terms: a data term D that relates two consecutive frames of the input image sequence via the optical flow and the parametrized illumination changes (in terms

4 Demetz, Stoll, Volz, Weickert, and Bruhn of coefficients), a flow regularization term R flow which encourages a piecewise affine flow field, and a coefficient regularization term R illum that assumes the coefficient fields to be piecewise smooth. Let us now discuss these terms in detail.

2.1

Data Term

Unlike traditional data terms for optical flow estimation that explain brightness changes in the image sequence exclusively by motion, our data term models changes in illumination as additional source for brightness variations. In order to estimate these changes jointly with the motion, we make use of a parametrized brightness transfer function (BTF) which was originally proposed by Grossberg and Nayar [17] in the context of photometric calibration for HDR imaging. This function maps intensities of the first frame to their intensities in the second frame. Given a set of n basis functions

φ j

:

R

R

, the corresponding brightness transfer function reads

Φ ( c , f ) = ¯ ( f ) + n

X c j

· φ j

( f ) , j =1

(2) where

¯

:

R

R is the mean brightness transfer function and c = ( c

1

, . . . , c n

)

> are linear weights. Let us now discuss how to embed this general model for brightness changes into a data term. To this end, we propose the following combination of the brightness constancy assumption and the gradient constancy assumption which can be seen as an extension of [6]. Defining ν as positive weight and using ∇ · as spatial gradient operator, our data term reads

D ( w , c ) = D bright

( w , c ) + νD grad

( w , c ) , (3) with

D bright

( w , c ) = Ψ d f

2

( x + w ) − Φ ( c ( x ) , f

1

( x ))

2

, (4) and

D grad

( w , c ) = Ψ d

∇ f

2

( x + w ) − ∇ Φ ( c ( x ) , f

1

( x ))

2

2

, (5) where brightness changes are now modeled to be spatially variant, i.e. with non-constant coefficients c . Thus, we allow different brightness transfer functions Φ at each position.

For both assumptions, the same sub-quadratic penalizer Ψ d

( s 2 ) = 2 λ 2 d

(1 + s 2 /λ 2 d

)

1

2 is used to render the approach more robust w.r.t. outliers [2].

Please note that flow variables and illumination coefficients have intentionally been distributed to different frames. This avoids products of unknowns when linearizing the assumptions later on and thus makes the minimization better tractable. Moreover, at first glance, it may seem counter-intuitive to combine our explicit estimation strategy with a gradient constancy assumption that is invariant under additive illumination changes.

However, the additional gradient constancy term supports the estimation at those locations where the coefficients can not adapt or have not yet adapted perfectly to the illumination changes. This is for instance the case at the beginning of the estimation, when neither the flow nor the coefficients have converged to their final values yet.

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow 5

2.2

Regularization Terms

Since the constancy assumptions in the data term may locally fail to provide any information, a spatial regularization of both the flow variables and the illumination coefficients is required. Moreover, it is not clear from the data term how to distribute observed brightness changes between motion and illumination. While the parametrization in terms of basis functions already provides a meaningful representation given by the coefficient fields, the concrete modeling of both regularization terms plays an important role in resolving this ambiguity. Let us now discuss how we model the two regularizers.

Flow Regularization.

While first order regularization strategies have a long and successful tradition [22], recently, second-order smoothness terms received notable attention. In particular, such terms turned out to be highly useful for non-fronto-parallel motion, since they are tailored towards piecewise affine solutions [5, 35, 42, 46]. Consequently, we make use of the following second-order regularizer that has already been used in the context of image denoising [26] and shape-from-shading [47]:

R flow

( w ) = α · Ψ s k H u k 2

F

+ k H v k 2

F

.

(6)

Here, α is a positive weight, k H ·k

F

2 λ 2 s

(1+ s 2 /λ 2 s

)

1

2 is the Frobenius norm of the Hessian, and Ψ s

( s

2

) = is a sub-quadratic penalizer that encourages piecewise affine solutions.

Coefficient Regularization.

In contrast to the flow regularizer that models a piecewise affine flow field, we assume that neighboring pixels are subject to similar illumination changes, i.e. that the coefficients of the basis functions are piecewise constant. Additionally, discontinuities in the coefficient fields are assumed to be aligned with edges in the input images (e.g. shadow edges) [29]. Consequently, we follow the idea of Zimmer et al.

[54] and employ the following anisotropic complementary regularization term

R illum

( c ) = β ·

2

X

Ψ i illum i =1 n

X

γ j

( r

> i

∇ c j

)

2 j =1

, (7) where β is a positive weight and the two directions r

1 and r

2

= r

1 allow to adapt the smoothing direction locally across and along image edges, respectively. As proposed in [54], these directions can be derived as the eigenvectors of the so-called regularization tensor. In our case, this tensor must be computed from the photometric uncompensated first frame f

1 to ensure that brightness information related to illumination changes is not discarded. Moreover, all coefficient fields are regularized jointly with a single penalizer function per direction, since spatial changes of the brightness transfer function typically result in discontinuities in all coefficient fields. In this context, the derivatives of the coefficients have to be balanced with weights γ j to reflect the different magnitude ranges of the coefficient fields. How we can estimate these weights together with the basis functions is discussed in Section 4. Finally, we have to define the penalizer functions. As suggested in [54], we use the edge-enhancing Perona-Malik regularizer

6 Demetz, Stoll, Volz, Weickert, and Bruhn

Ψ

1 illum

( s

2

) = λ

2 c log(1 + s

2

2 c

) as penalizer across edges (in apply the edge-preserving Charbonnier regularizer Ψ

2 illum

( s

2 r

) = 2

1

-direction), while we

λ

2 c

(1+ s

2

2 c

)

1

2 along them (in r

2

-direction).

3 Minimization

In order to handle large displacements, we follow the warping strategy of Brox et al.

[6].

We split the unknowns, i.e. the flow field w and the coefficient fields c , into a known part w k , c k and an unknown increment dw k , dc k and embed their overall estimation into a coarse-to-fine fixed point iteration. Moreover, within this (outer) fixed point iteration, we linearize the brightness and the gradient constancy assumptions in terms of the flow increments dw k such that we finally approximate the original non-convex optimization problem by a series of convex optimization problems.

Actually, this strategy comes down to solving a differential formulation of the original energy (1) at each level k of the coarse-to-fine approach. If we denote the first frame by f

1 k

= f

1

( x, y ) and the motion compensated second frame by f k

2

= f

2

( x + u k

, y + v k

) the corresponding differential formulation of data and smoothness terms is given as fol-

, lows. While the brightness and gradient constancy terms become

D k bright

= Ψ d

θ · f k

2 ,x du k

+ f k

2 ,y dv k

+ f

2 k n

¯

( f

1 k

) −

X

( c k j

+ dc k j

) · φ j

( f

1 k

) j =1

2

(8) and

D k grad

= Ψ d

θ x

0

0 θ y

∇ f k

2 ,x du k

+ ∇ f k

2 ,y dv k

+ ∇ f k

2

− n

X

φ j

( f

1 k

) · ∇ ( c k j

+ dc k j

) j =1

− n

φ

0

( f

1 k

)+

X

( c k j

+ dc k j

) · φ

0 j

( f

1 k

) · ∇ f k

1 j =1

2 !

2

, (9) respectively, the flow regularizer is given by

R k flow

= α · Ψ s k H ( u k

+ du k

) k 2

F

+ k H ( v k

+ dv k

) k 2

F

, and the coefficient regularizer reads

R k illum

= β ·

2

X i =1

Ψ i illum n

X

γ j

( r

> i

∇ ( c k j j =1

+ dc k j

))

2

.

(10)

(11)

Additionally, constraint normalization has been applied to all constancy assumptions in terms of the weights θ , θ x more than two variables.

and θ y as proposed in [44] for linearized constraints with

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow 7

After discretizing the Euler-Lagrange equations of this differential energy with finite differences, we obtain a nonlinear system of equations due to the derivatives of the subquadratic penalizers Ψ

. This nonlinear system is then solved by another (inner) fixed point iteration: All nonlinear expressions are repeatedly kept fixed, and the resulting linear systems are solved using the successive overrelaxation (SOR) method [52].

In order to speed up computations, the inner fixed point iteration is embedded in a cascadic multigrid scheme, i.e. finer levels are initialized with coarse-scale solutions [4].

This also explains why we use an incremental computation of the coefficient fields c k although the corresponding expressions are linear in the original functional (1). As the cascadic multigrid approach always starts the computation from scratch, we need an increment ( dc k

) which can be initialized to zero without losing all previous information.

4 Basis Learning for Brightness Transfer Functions

In the previous sections, we have introduced our novel variational model and have sketched how to minimize the corresponding energy. The goal of this section is to explain how we estimate the mean BTF

¯

( f ) , the basis functions φ j as well as the associated weights γ j for the joint regularizer of the coefficient fields. The basic strategy of this paper is inspired by the Empirical Model of Response (EMoR) of Grossberg and

Nayar [17], where the camera response function of imaging systems is parametrized with a set of basis functions. However, our model acts on intensities instead of irradiances. As already mentioned in Section 2, input intensities f are mapped to output intensities via

Φ ( f ) = ¯ ( f ) + n

X c j

· φ j

( f ) .

(12) j =1

Note that many kinds of polynomial and exponential illumination models can be represented using the appropriate basis functions. For instance, the affine model of Negahdaripour [30] fits into this framework by choosing

¯

( f ) = 0 , φ

1

( f ) = 1 , φ

2

( f ) = f .

(13)

The same holds also for the purely additive models in [7, 28], i.e. if instead φ

2

( f ) = 0 .

The recent KITTI Vision Benchmark Suite [14] offers a huge set of real-world image sequences together with ground truth optical flow fields. This gives us access to samples of input and output intensity levels of realistic scenarios. In particular, the availability of optical flow fields allows us to register consecutive frames and to analyze the behavior of the true BTF on a per-pixel basis.

Our general strategy to learn a basis from this massive amount of training data consists of three steps: In a first step, we segment and cluster the training images according to illumination changes along the ground truth flow. The segmentation is important, since we cannot expect that different image pairs provide fundamentally different global BTFs. Instead, we have to estimate multiple BTFs per image pair, since typical illumination changes such as drop shadows or specular reflections are local phenomena.

In a second step, we use the segmented input images and compute for each region of each input image a separate BTF. In a third step, all these BTFs are used to perform a

8 Demetz, Stoll, Volz, Weickert, and Bruhn principal component analysis (PCA) in order to identify the most representative basis functions for the observed illumination changes. It is worth noting that these steps can be applied iteratively, i.e. the estimated basis functions can be used again to segment the input images and thus to obtain improved BTFs. Let us now detail on these steps.

Segmenting Illumination Changes.

Let us assume that we are given the training image sequences with corresponding ground truth flows. In order to discriminate the image regions with distinct lighting situations, i.e. with different brightness transfer functions, for each image pair, we first have to determine the pointwise BTF for every pixel. However, although this pointwise BTF can be arbitrarily complex, the given images provide only one constraint per pixel: The unknown BTF must map the intensity of this pixel in the first frame to the intensity of the corresponding pixel in the second frame.

To relax this extremely under-determined problem, let us now assume we are already given an estimate of the basis functions. Then, the sought pointwise BTF can be approximated using the given basis, and our task comes down to computing the optimal coefficient vector c in each pixel. Consequently, this problem fits perfectly into our variational model from Section 2, with the difference that we only have to solve for the coefficients c , since the optical flow is given and does not need to be estimated.

However, as the ground truth might not be provided at every pixel (i.e. due to occlusions or due to sparse laser scans), we have to disable the data term at those positions where a flow vector is missing. Basically, this procedure leads to a variational inpainting method [49], because if the data term is disabled, only the coefficient regularization term contributes to the energy. Note that unlike in traditional inpainting scenarios, we are not interested in the coefficient values at positions with missing data. We only want to enforce global communication in order to avoid isolated estimates.

Fig. 1. Left: Frame 1 of KITTI training sequence #114.

Right: Corresponding K-Means segmentation. Each color indicates a separate cluster, black pixels denote locations where ground truth is missing. The separation between the stronger brightening effect on the street and the weaker brightening effect in the environment becomes visible. Moreover, the inter-reflections at the windshield show off in terms of the three red spots on the street.

Once the coefficients are found, we perform a K -Means clustering (usually K = 5 ) on the coefficients. This takes place exclusively in the n -dimensional coefficient space; spatial coordinates are intentionally ignored here in order to allow spatially disjoint regions belonging to the same segment. All pixels whose coefficients have been clustered together share a similar brightness transfer function and thus exhibit a similar lighting situation. Figure 1 shows an example where such a segmentation allows to distinguish regions in the image that undergo different brightening effects.

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow 9

Estimating Brightness Transfer Functions.

Given the previously computed segmentation, the next task is to estimate one brightness transfer function g :

R

R per segment. To this end, we adopt the global idea of Grossberg and Nayar [16] locally :

For each segment we construct the intensity histogram h

1 as well as the histogram h

2 of the pixels in the first frame of the corresponding intensities in the second frame. In this context, we only consider pixels with valid optical flow, i.e. a ground truth vector must be given and must not point out of the image domain. Once both histograms have been created, we compute the BTF that transforms h

1 into h

2 by means of a histogram specification. In this context, fully saturated segments or too small clusters may lead to wrong and unrealistic brightness transfer functions. To avoid this, we reject any segments in which more than 80% of all pixels have the same intensity, as well as segments in which more than one third of all possible intensities do not occur. Please note that the resulting function of the histogram specification is discrete and given by an arbitrary vector g ∈

R

256 that is not parametrized in terms of basis functions and coefficients.

Along with the BTFs from other segments it serves as input for the following PCA.

Learning the Basis.

After having performed the previous clustering and estimation steps on each of the p training image pairs we obtain m ≤ K · p brightness transfer functions, so-called observations . In order to find one common set of basis functions for all of them, we perform a principal component analysis (PCA). After concatenating all observations g i

( i = 1 , . . . , m ) into a so-called observation matrix

G = ( g

1

| . . .

| g m

) ∈

R

256 × m

, (14) we compute the row-wise mean (i.e. the sample mean over all observations) ¯ of G .

Then we obtain the covariance matrix C as

C = U

>

Σ U =

1 m − 1 m

X

( g i

− ¯ )( g i

− ¯ )

>

.

i =1

(15)

From this principal component decompositon, the sought basis functions φ j

( j = 1 ,...,n ) can be found as the eigenvectors of the covariance matrix (the columns of U ). Moreover, the row-wise mean ¯ coincides with the 0 -th basis function which is the mean brightness transfer function

¯

. Furthermore, the diagonal matrix Σ contains the eigenvalues which represent the variance of the given data along the principal components.

This is a well-suited estimate for the relative magnitude of the coefficients. Hence, we set the weights γ j in the anisotropic coefficient regularization term (7) to be the inverse square roots of the eigenvalues.

Figure 2 shows the estimated bases for the KITTI Vision Benchmark Suite and compares it to an affine basis and the EMoR basis provided by [17]. We can see that compared to the EMoR basis our basis functions for the KITTI benchmark rather model illumination changes in the upper part of the dynamic range. Moreover, the mean brightness transfer function is rather linear, since we do not estimate a camera response function as in [17] but a mapping between intensities (where identity is expected as average).

10 Demetz, Stoll, Volz, Weickert, and Bruhn

0.12

0.1

0.08

φ

1

φ

2

0.06

0.04

0.02

0

0 50 100 150 200 250

1

0.8

0.6

0.4

φ

φ

φ

1

φ

2

φ

3

4

0.2

0

-0.2

0 50 100 150 200 250

1.2

1

0.8

0.6

0.4

0.2

0

-0.2

0

φ

φ

φ

1

φ

2

φ

3

4

50 100 150 200 250

Fig. 2.

Comparison of different basis functions.

From left to right: (a) Normalized affine basis.

(b) EMoR functions [16].

(c) Our basis functions learned from KITTI ground truth data.

Iterating the Estimation.

The strategy we have described so far assumes a basis to be given for the clustering step. Initially, however, only the training images and ground truth flows are given. Thus, in our first iteration loop we omit the clustering step, treat the whole images as one segment, and estimate one global brightness transfer function per image pair. This leads to a first estimate for the basis which allows us then to perform the clustering as described. The impact of iterating the estimation of the basis functions on their shapes can be seen in Figure 3. While the mean BTF remains approximately the identity, the main support of the other basis functions is even further shifted towards the upper end of the dynamic range.

1.2

1

0.8

0.6

0.4

0.2

0

-0.2

0

φ

φ

φ

1

φ

2

φ

3

4

50 100 150 200 250

1.2

1

0.8

0.6

0.4

0.2

0

-0.2

0

φ

φ

φ

1

φ

2

φ

3

4

50 100 150 200 250

1.2

1

0.8

0.6

0.4

0.2

0

-0.2

0

φ

φ

φ

1

φ

2

φ

3

4

50 100 150 200 250

Fig. 3.

Impact of iterating the estimation of the KITTI basis functions.

From left to right: (a)

Initial basis.

(b) After one iteration.

(c) After four iterations.

5 Evaluation

Our experiments are focused on the KITTI Vision Benchmark Suite [14] which offers a large amount of images depicting driving scenarios with challenging illumination changes. With our experiments we want to demonstrate that our method is very well suited for this kind of real-world imagery. The runtime of our single core implementation on an Intel XEON workstation with 3.2 GHz is about 80 seconds for each of the sequences (image size 1240 × 376 ). Results of our experiments are given in terms of the bad pixel 3 (BP3) error measure that describes the percentage of estimated flow vectors that differ by more than 3 pixels from the respective ground truth.

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow

Table 1.

Comparison of different variants of our method on the full KITTI training set.

Configuration

Baseline (without illumination compensation)

Affine basis

EMoR basis

KITTI basis

KITTI basis (iterated)

KITTI basis (iterated, without gradient constancy)

KITTI basis (iterated, only gradient constancy) avg. BP3 error (occ)

11.17 %

11.07 %

10.64 %

10.71 %

10.19 %

10.65 %

10.95 %

11

Although our model contains a considerable number of parameters, effectively we set most of them fixed, and only adjusted the three main model parameters α , β , and ν .

The contrast parameters for the sub-quadratic functions have been chosen fixed for all experiments as ( λ d

, λ s

, λ

1 , 2 illum

) = (0 .

01 , 0 .

5 , 0 .

01) . Concerning the K-Means clustering step we kept K = 5 fixed as well. For the number n of basis functions we found n = 4 to be a good tradeoff between computational effort and accuracy.

Evaluation of Basis Functions.

In our first experiment, we investigate the usefulness of illumination estimation in general and analyze the impact of choosing different sets of basis functions on the quality of the flow estimation. To this end, we have computed the average BP3 errors of our method – with different bases as well as without any illumination compensation – on the provided set of 194 training sequences. For each configuration, we have optimized the parameters of our model w.r.t. the average BP3 error for the whole training set using the ground truth including occluded pixels ( occ ).

The results of this experiment are presented in Table 1. On the one hand, they clearly show that the benefit of using the classical affine model of Gennert and Negahdaripour [15] is rather limited compared to the baseline method without any illumination compensation – on average the error does hardly decrease. On the other hand, we can observe a clear improvement when choosing or learning a more suitable (and less ad-hoc) set of basis functions. Moreover, our experiments show that refining the basis functions iteratively allows to further improve the results. In the end, our proposed illumination model has been able to impove results for 81% of all training sequences.

Finally, we also analyze the impact of the gradient constancy assumption. To this end, we consider variants of our method where the gradient constancy term has either been disabled or is the only constancy assumption. The two last rows of Table 1 show that in both cases the results deteriorate. This underlines our considerations from Section 2:

The gradient term provides an improved initialization of the flow in early iterations, where the coefficient fields have not yet converged. However, the gradient term alone cannot provide sufficient information for estimating the basis coefficients since it considers only local differences, but establishes a mapping between absolute grey values.

Analysis of Transfer Functions and Coefficient Fields.

In our second experiment we shed light on the coefficients that are estimated jointly with the optical flow. To this end, we have picked one of the training sequences with moderate illumination changes, see

12 Demetz, Stoll, Volz, Weickert, and Bruhn

Figure 4, and another sequence with severe illumination changes, see Figure 5. The two figures show the first frame of the respective sequence together with the estimated flow as well as the four computed coefficient fields. Furthermore, we have highlighted interesting locations in the images using colored squares. The brightness transfer functions at those locations – that can be computed as linear combinations of the learned basis functions weighted by the estimated local coefficients – are jointly depicted in a graph using the corresponding colors.

For our first challenging example (Figure 4), the flow field appears reasonably accurate, which is confirmed by a BP3 error of only 9 .

41% . Since the lighting changes in that image sequence are rather global, the extracted brightness transfer functions are similar in shape. In fact, they only differ in the upper end of the dynamic range. As can be seen from the BTFs, the image becomes darker. This is mainly reflected by the strongly negative values in the coefficient field c

1

(that belongs to a positive basis function). Moreover, slight local variations of the BTFs can be observed in the coefficient plots, in particular in the plots of the coefficient fields c

2

, c

3

, c

4

. An example, where our model has actually estimated significantly differing BTFs for different parts of the image is presented in Figure 5. Particularly challenging in this sequence are the interreflections in the windshield in front of the camera. However, the flow field is still of reasonable quality (BP3 of 5 .

33% ). Apart of the spatially varying BTFs, one can also observe that the inter-reflections are reproduced by the corresponding coefficient fields.

250

200

150

100

50

0

0 50 100 150 200 250

Fig. 4.

Estimated coefficients and BTFs.

Left column, from top to bottom: First frame of KITTI training sequence #15 with three highlighted positions, estimated optical flow field, plot of the three corresponding brightness transfer functions. Plot colors coincide with the marker colors.

Right column, from top to bottom: Estimated coefficient fields c

1 to c

4

. Coefficients have been shifted such that a grey value of 127 denotes a coefficient of 0. Brighter values denote positive coefficients, darker values negative coefficients.

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow 13

250

200

150

100

50

0

0 50 100 150 200 250

Fig. 5.

Estimated coefficients and BTFs.

Left column, from top to bottom: First frame of KITTI training sequence #114 with three highlighted positions, estimated optical flow field, plot of the three corresponding brightness transfer functions. Plot colors coincide with the marker colors.

Right column, from top to bottom: Estimated coefficient fields c

1 to c

4

. Coefficients have been shifted such that a grey value of 127 denotes a coefficient of 0. Brighter values denote positive coefficients, darker values negative coefficients.

Table 2.

Error statistics of our method for the bad pixel measure with varying thresholds (BP2 -

BP5), averaged over all sequences of the KITTI evaluation benchmark.

Error Out-Noc Out-All Avg-Noc Avg-All

2 pixels 8.84 % 14.14 %

3 pixels 6.52 % 11.03 %

1.5 px

1.5 px

2.8 px

2.8 px

4 pixels 5.38 % 9.29 %

5 pixels 4.64 % 8.11 %

1.5 px 2.8 px

1.5 px 2.8 px

Comparison to the Literature.

In our third experiment, we compare our method to other approaches from the literature. To this end, we evaluated our method on the KITTI test sequences using the optimized parameters ( α , β , ν )=(5.2, 3, 6.25). The corresponding results are shown in Tables 2 and 3. While Table 2 gives detailed information on the performance of our algorithm in non-occluded and all regions for different thresholds of the bad pixel error measure (BP2 - BP5), Table 3 shows the performance of our algorithm compared to other pure two-frame optical flow methods without stereo constraints

(such constraints are likely to fail in realistic scenarios with independently moving objects). As one can see, our method is among the leading optical flow approaches in

14 Demetz, Stoll, Volz, Weickert, and Bruhn this benchmark. In particular, when considering all pixels (i.e. also occluded regions), our method ranks first and is significantly more accurate than previous approaches. This clearly demonstrates that performing a joint estimation of illumination changes and motion can outperform methods discarding illumination information by using invariants.

Table 3.

Comparison of pure two-frame optical flow methods for the KITTI evaluation sequences.

Superscripts denote the rank of each method in the corresponding column at time of submission.

Method

DDR-DF

Out-Noc Out-All Avg-Noc Avg-All

6.03 % 1 13.08 % 2 1.6 px 5 4.2 px 3

2 4 2 4 TGV2ADCSIFT 6.20 %

Our method 6.52 % 3

15.15 %

11.03 % 1

1.5 px

1.5 px 2

4.5 px

2.8 px 1

4 3 6 5

Data-Flow

EpicFlow

7.11 %

7.19 % 5

14.57 %

16.15 % 5

1.9 px

1.4 px 1

5.5 px

3.7 px 2

DeepFlow 7.22 % 6 17.79 % 6 1.5 px 2 5.8 px 7

7 7 TVL1-HOG

MLDP-OF

DescFlow

CRTflow

C++

C+NL

IVANN fSGM

7.91 %

8.67 %

8.76 %

9.43 %

10.04 %

10.49 %

10.68 %

10.74 %

TGV2CENSUS 11.03 %

8

9

10

11

12

13

14

15

18.90 % 10

18.78 % 9

19.45 % 11

18.72 % 8

12

20.26 %

20.64 % 13

14

21.09 %

22.66 % 15

18.37 % 7

2.0 px

2.4 px

2.1 px

2.7 px

2.6 px

2.8 px

2.7 px

3.2 px

2.9 px

9

8

11

10

13

11

15

14

6.1 px

6.7 px

8

11

5.7 px

6.5 px

7.2 px

6

9

7.1 px

12

13

14

7.4 px

12.2 px

6.6 px

15

10

6 Conclusions and Outlook

In this work we have addressed the problem of estimating the optical flow under uncontrolled illumination. In contrast to recent state-of-the-art methods that simply discard illumination information, we have proposed a novel variational model for jointly estimating both illumination changes and optical flow. In this context, we have contributed in three different ways: (i) In order to find a meaningful representation of illumination changes we have learned brightness transfer basis functions from previously segmented training data. (ii) By imposing a complementary regularizer on the corresponding coefficient fields we have been able to achieve a sharp separation between areas of different illumination changes while maintaining smoothness of the resulting flow field itself. (iii)

By embedding both the basis functions and the coefficient regularization into a recent variational framework, we achieve state-of-the-art accuracy on the KITTI benchmark, outperforming competing approaches based on illumination-invariant assumptions.

This shows that approaches that additionally estimate relevant information, such as illumination changes, are a worthwhile alternative to approaches that simply discard that information for the sake of robustness. Moreover, such approaches can be used to provide subsequent algorithms with this additional information improving the overall performance. Future work includes the online learning of basis functions as well as the development of efficient numerical schemes on the GPU to speed up the computation.

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow

References

15

1. Barnes, C., Shechtman, E., Goldman, D.B., Finkelstein, A.: The generalized PatchMatch correspondence algorithm. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision

– ECCV 2010, Lecture Notes in Computer Science, vol. 6312, pp. 29–43. Springer (2010)

2. Black, M.J., Anandan, P.: Robust dynamic motion estimation over time. In: Proc. 1991 IEEE

Computer Society Conference on Computer Vision and Pattern Recognition. pp. 292–302.

IEEE Computer Society Press, Maui, HI (Jun 1991)

3. Black, M.J., Fleet, D., Yacoob, Y.: Robustly estimating changes in image appearance. Computer Vision and Image Understanding 78(1), 8–31 (2000)

4. Bornemann, F., Deuflhard, P.: The cascadic multigrid method for elliptic problems. Numerische Mathematik 75, 135–152 (1996)

5. Braux-Zin, J., Dupont, R., Bartoli, A.: A general dense image matching framework combining direct and feature-based costs. In: Proc. IEEE International Conference on Computer

Vision (ICCV). pp. 185–192. IEEE Press (2013)

6. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) Computer Vision – ECCV 2004,

Lecture Notes in Computer Science, vol. 3024, pp. 25–36. Springer (2004)

7. Cornelius, N., Kanade, T.: Adapting optical-flow to measure object motion in reflectance and

X-ray image sequences. Computer Graphics 18(1), 24–25 (1984)

8. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Schmid,

C., Soatto, S., Tomasi, C. (eds.) Proc. IEEE Conference on Computer Vision and Pattern

Recognition (CVPR). vol. 2, pp. 886–893 (2005)

9. Debevec, P.E., Malik, J.: Recovering high dynamic range radiance maps from photographs.

In: Proc. SIGGRAPH 97. pp. 369–378. Annual Conference Series, ACM Press (1997)

10. Dederscheck, D., M¨uller, T., Mester, R.: Illumination invariance for driving scene optical flow using comparagram preselection. In: Proc. IEEE Intelligent Vehicles Symposium (IV).

pp. 742–747 (2012)

11. Demetz, O., Hafner, D., Weickert, J.: The complete rank transform: A tool for accurate and morphologically invariant matching of structures. In: Proceedings of the British Machine

Vision Conference. BMVA Press (2013)

12. Garg, R., Roussos, A., Agapito, L.: A variational approach to video registration with subspace constraints. International Journal of Computer Vision 104(3), 286–314 (2013)

13. Garrido, P., Valgaerts, L., Wu, C., Theobalt, C.: Reconstructing detailed dynamic face geometry from monocular video. ACM Transactions on Graphics 32(6), 158:1–158:10 (2013)

14. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition

(CVPR). pp. 3354–3361. IEEE Computer Society (2012)

15. Gennert, M.A., Negahdaripour, S.: Relaxing the brightness constancy assumption in computing optical flow. Tech. Rep. 975, Artificial Intelligence Laboratory, Massachusetts Institiute of Technology (June 1987)

16. Grossberg, M.D., Nayar, S.K.: What can be known about the radiometric response from images? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer Vision – ECCV

2002, Lecture Notes in Computer Science, vol. 2350, pp. 189–205. Springer (2002)

17. Grossberg, M.D., Nayar, S.K.: Modeling the space of camera response functions. IEEE

Transactions on Pattern Analysis and Machine Intelligence 26(10), 1272–1282 (2004)

18. HaCohen, Y., Shechtman, E., Goldman, D.B., Lischinski, D.: Non-rigid dense correspondence with applications for image enhancement. ACM Transactions on Graphics 30(4),

70:1–70:9 (2011)

16 Demetz, Stoll, Volz, Weickert, and Bruhn

19. Hager, G.D., Belhumeur, P.N.: Real-time tracking of image regions with changes in geometry and illumination. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition

(CVPR). pp. 403–410 (1996)

20. Haussecker, H.W., Fleet, D.J.: Estimating optical flow with physical models of brightness variation. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6), 661–673

(2001)

21. Hermosillo, G., Chefd’Hotel, C., Faugeras, O.: Variational methods for multimodal image matching. International Journal of Computer Vision 50(3), 329–343 (2002)

22. Horn, B., Schunck, B.: Determining optical flow. Artificial Intelligence 17, 185–203 (1981)

23. Kim, T.H., Lee, H.S., Lee, K.M.: Optical flow via locally adaptive fusion of complementary data costs. In: Proc. IEEE International Conference on Computer Vision (ICCV). pp. 3344–

3351. IEEE Press (2013)

24. Liu, C., Yuen, J., Torralba, A.: SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(5), 978–994

(2011)

25. Lowe, D.L.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)

26. Lysaker, M., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. Transactions on Image Processing 12(12), 1579–1590 (2003)

27. Mileva, Y., Bruhn, A., Weickert, J.: Illumination-robust variational optical flow with photometric invariants. In: Hamprecht, F.A., Schn¨orr, C., J¨ahne, B. (eds.) Pattern Recognition,

Lecture Notes in Computer Science, vol. 4713, pp. 152–162. Springer (2007)

28. Mukawa, N.: Estimation of shape, reflection coefficients and illuminant direction from image sequences. In: Proc. IEEE International Conference on Computer Vision (ICCV). pp. 507–

512 (1990)

29. Nagel, H.H., Enkelmann, W.: An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 565–593 (1986)

30. Negahdaripour, S., Yu, C.H.: A generalized brightness change model for computing optical flow. In: Proc. IEEE International Conference on Computer Vision (ICCV). pp. 2–11. IEEE

Computer Society (1993)

31. Nir, T., Bruckstein, A.M., Kimmel, R.: Over-parameterized variational optical flow. International Journal of Computer Vision 76(2), 205–216 (2008)

32. Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis.

IEEE Transactions on Pattern Analysis and Machine Intelligence (2013), early Access

33. Onkarappa, N., Sappa, A.: Speed and texture: an empirical study on optical-flow accuracy in

ADAS scenarios. IEEE Transactions on Intelligent Transportation Systems 15(1), 136–147

(2014)

34. Papenberg, N., Bruhn, A., Brox, T., Didas, S., Weickert, J.: Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision 67(2),

141–158 (Apr 2006)

35. Ranftl, R., Gehrig, S., Pock, T., Bischof, H.: Pushing the limits of stereo using variational stereo estimation. In: IEEE Intelligent Vehicles Symposium. pp. 401–407 (2012)

36. Rashwan, H., Mohamed, M., Garcia, M., Mertsching, B., Puig, D.: Illumination robust optical flow model based on histogram of oriented gradients. In: Weickert, J., Hein, M., Schiele,

B. (eds.) Pattern Recognition. Lecture Notes in Computer Science, vol. 8142, pp. 354–363.

Springer, Berlin (2013)

37. Sadek, R., Facciolo, G., Arias, P., Caselles, V.: A variational model for gradient-based video editing. International Journal of Computer Vision 103(1), 127–162 (2013)

BTF Learning for the Joint Recovery of Illumination Changes and Optical Flow 17

38. Schn¨orr, C.: On functionals with greyvalue-controlled smoothness terms for determining optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(10), 1074–

1079 (1993)

39. Stein, F.: Efficient Computation of Optical Flow Using the Census Transform. In: Rasmussen, C.E., B¨ulthoff, H.H., Sch¨olkopf, B., Giese, M.A. (eds.) Pattern Recognition, Lecture

Notes in Computer Science, vol. 3175, pp. 79–86. Springer (2004)

40. Steinbr¨ucker, F., Pock, T., Cremers, D.: Advanced data terms for variational optic flow estimation. In: Magnor, M.A., Rosenhahn, B., Theisel, H. (eds.) Proceedings of the Vision,

Modeling, and Visualization Workshop (VMV). pp. 155–164. DNB (2009)

41. Sun, D., Roth, S., Black, M.J.: A quantitative analysis of current practices in optical flow estimation and the principles behind them. International Journal of Computer Vision 106(2),

115–137 (2014)

42. Trobin, W., Pock, T., Cremers, D., Bischof, H.: An unbiased second-order prior for highaccuracy motion estimation. In: Rigoll, G. (ed.) Pattern Recognition. vol. 5096, pp. 396–405.

Springer, Berlin, Heidelberg (2008)

43. Uras, S., Girosi, F., Verri, A., Torre, V.: A computational approach to motion perception.

Biological Cybernetics 60, 79–87 (1988)

44. Valgaerts, L., Bruhn, A., Zimmer, H., Weickert, J., Stoll, C., Theobalt, C.: Joint estimation of motion, structure and geometry from stereo sequences. In: Daniilidis, K., Maragos, P.,

Paragios, N. (eds.) Computer Vision – ECCV 2010, Lecture Notes in Computer Science, vol. 6314, pp. 568–581. Springer, Berlin (2010)

45. van de Weijer, J., Gevers, T.: Robust optical flow from photometric invariants. In: Proc. IEEE

International Conference on Image Processing (ICIP). pp. 1835–1838 (2004)

46. Vogel, C., Roth, S., Schindler, K.: An evaluation of data costs for optical flow. In: Weickert,

J., Hein, M., Schiele, B. (eds.) Pattern Recognition. Lecture Notes in Computer Science, vol.

8142, pp. 343–353. Springer, Berlin (2013)

47. Vogel, O., Bruhn, A., Weickert, J., Didas, S.: Direct shape-from-shading with adaptive higher order regularisation. In: Sgallari, F., Murli, F., Paragios, N. (eds.) Scale Space and Variational

Methods in Computer Vision, Lecture Notes in Computer Science, vol. 4485, pp. 871–882.

Springer, Berlin (2007)

48. Wedel, A., Pock, T., Zach, C., Cremers, D., Bischof, H.: An improved algorithm for TV-L1 optical flow. In: Cremers, D., Rosenhahn, B., Yuille, A.L., Schmidt, F.R. (eds.) Statistical and

Geometrical Approaches to Visual Motion Analysis. Lecture Notes in Computer Science, vol. 5604, pp. 23–45. Springer (September 2008)

49. Weickert, J., Welk, M.: Tensor field interpolation with PDEs. In: Weickert, J., Hagen, H.

(eds.) Visualization and Processing of Tensor Fields, pp. 315–325. Springer (2006)

50. Werlberger, M., Pock, T., Bischof, H.: Motion estimation with non-local total variation regularization. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

pp. 2464–2471 (June 2010)

51. Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow estimation. In:

Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1293–

1300. IEEE Computer Society Press (2010)

52. Young, D.M.: Iterative Solution of Large Linear Systems. Academic Press, New York (1971)

53. Zabih, R., Woodfill, J.: Non-parametric local transforms for computing visual correspondence. In: Eklundh, J.O. (ed.) Computer Vision – ECCV 1994, Lecture Notes in Computer

Science, vol. 800, pp. 151–158. Springer, Berlin (1994)

54. Zimmer, H., Bruhn, A., Weickert, J.: Optic flow in harmony. International Journal of Computer Vision 93(3), 368–388 (2011)

Download