当前位置:首页 >> >>

Optical flow estimation based on the extraction of motion patterns

OPTICAL FLOW ESTIMATION BASED ON THE EXTRACTION OF MOTION PATTERNS J. Chamorro-Mart?nez and J. Fdez-Valdivia ? Department of Computer Science and Arti?cial Intelligence University of Granada, Spain
e-mail : {jesus,jfv}@decsai.ugr.es

ABSTRACT In this paper, a new methodology for optical ?ow estimation that is able to represent multiple motions is presented. To separate motions at the same location, a new frequency-domain approach is used. This model, based on a band-pass ?ltering with a set of logGabor spatio-temporal ?lters, groups together ?lter responses with continuity in its motion (each group will de?ne a motion pattern). Given a motion pattern, the gradient constraints is applied to the output of each ?lter in order to obtain multiple estimates of the velocity at the same location. Then, the velocities at each point of the motion pattern are combined using probabilistic rules. The use of “motion patterns” allows to represent multiple motions, while the combination of estimates from different ?lters helps to reduce the initial aperture problem. This technique is illustrated on real and simulated data sets, including sequences with occlusion and transparencies. Keywords: Optical ?ow, multiple motions, spatio-temporal models, motion pattern. 1. INTRODUCTION The estimation of the optical ?ow, an approximation to image motion, is an important problem in processing sequences of images. Many techniques have been proposed in the literature; for example, differential methods, which rely on the assumption that the intensity levels in the image remain constant over the time [1], matching techniques, which operate by matching small regions of intensity, and frequency-based methods, which are based on spatio-temporally oriented ?lters [2, 3]. An important point to take into account in the optical ?ow estimation is the presence of multiple motions at the same location. Occlusions and transparencies are two common examples of this phenomena, where traditional methods fail. These problems are currently being addressed by the research community; see, for example, the strategies based on the use of mixed velocity distributions (usually two) at each point [4], the line processes based models [5] or the parametric models [6]. Another important group of techniques are based on spatio-temporal ?lters [2]. These approaches are derived by considering the motion problem in the Fourier domain: the spectrum of a spatio-temporal translation lies in a plane whose orientation depends on the direction and velocity of the motion. Although the ?lters are a powerful tool to separate the motions presented in a sequence [7], the main problem of these schemes is that orientation selectivity tends to increase the aperture problem. Moreover, components of the same motion with different spatial characteristics are separated in different ?lters responses. In this paper, we develop a methodology for optical ?ow estimation that is able to represent multiple motions. To separate mo-

tions at the same location, the model introduced in [7] is used. This model is a frequency-based approach that groups ?lter responses with continuity in its motion (each group will de?ne a motion pattern). This grouping allows to eliminate the problems describe above relating to the spatial dependency. Given a motion pattern (a group of ?lters), we ?rst apply the gradient constraints to the output of each ?lter in order to obtain multiple estimates of the velocity at the same location. Then we combine the velocities at each point of the motion patterns using probability rules. The use of “motion patterns” allows to represent multiple motions, while the combination of estimates from different ?lters helps to reduce the initial aperture problem. 2. MOTION PATTERNS To separate motions at the same location, the frequency-domain approach introduced in [7] is used. The ?gure 1 shows a general diagram describing how the data ?ows through the model. This diagram illustrates the analysis on a given sequence showing a clap of hands. The endpoint of analyzing this sequence is to separate the two hand motions. In a ?rst stage, a three-dimensional representation is performed from the original sequence and then its Fourier transform is calculated. Given a bank of spatio-temporal logGabor ?lters, a subset of them is selected in order to extract signi?cant spectral information. These selected ?lters are applied over the original spatio-temporal image in order to obtain a set of active responses (note that we only use a subset of ?lters). In the second stage, for each pair of active ?lters, their responses are compared based on the distance between their statistical structure, computed over those points which form relevant points of the ?lters (we calculate these points as local energy peaks on the ?lter response) As a result, a set of distances between active ?lters is obtained [8]. In a third stage, a clustering on the basis of the distance between the active ?lter responses is performed to highlight invariance of responses. Each of the cluster obtained in this stage de?nes a motion pattern. In ?gure 1, two collections of ?lters have been obtained for the input sequence. 3. OPTICAL FLOW ESTIMATION In this section, the frequency-based model introduced in section 2 will be used to obtain an optical ?ow estimation able to represent multiple motions. In section 3.1 a technique based on the classic gradient constraint is proposed to obtain the optical ?ow estimation corresponding to each ?lter response. In section 3.2, a methodology to integrate the estimations corresponding to the

0-7803-7750-8/03/$17.00 ?2003 IEEE.

ICIP 2003


Spatio?temporal filtering
Bank of filters

Active filters

?v i =

wr Mr + ??1 p r γ1 fe 2 + γ2



Active filter responses

with R being the number of points in the neighborhood of (x, y, t), wn being a weight vector that gives more in?uence to elements at the center of the neighborhood than to those at the periphery, ?p the covariance of the prior distribution of vi [9], and Mr and dr de?ned as Mr =
2 fx fy fx

fx fy 2 fy

dr =

fx ft fy ft


Distance between statistical structures
Relevant points

with fe = (fx , fy ) and ft being the spatial and temporal partial derivatives [9] (for the sake of simplicity, we have removed the spatio-temporal parameters (x, y, t) in the notation). Thus, given a point (x, y, t), we will have an estimation for each active ?lter. 3.1.1. Con?dence measure The covariance matrix ?vi can be used to de?ne a con?dence measure of the estimation vi [9]. In this paper, we will use the smallest eigenvalue of ??1 as con?dence measure of vi [10] and vi it will be noted as λvi : λvi = min λi , λi 1 2 (5)

Statistical structure at

Statistical structure at

relevant points of G i Distance(Gi,G ) j

relevant points of Gj

Clustering of active filter responses

2 3 1 4 5 7 8 6









Active filters





where λi and λi are the two eigenvalues of ??1 (for the sake 1 2 vi of simplicity, we have left out the spatio-temporal parameters (x, y, t) in the notation λvi (x, y, t)). Therefore, an estimation vi at a given point (x, y, t) of the i-th ?lter φi will be accepted if λvi ≥ Tφi , where Tφi is a con?dence threshold associated to the ?lter φi . Under the assumption that every relevant point of the ?lter will generate a reliable estimation, the following approximation is proposed to calculate Tφi : Tφi = min {λvi (x, y, t) / (x, y, t) ∈ P (φi )} (6)


Fig. 1. A general diagram of the frequency-based model.

where P (φi ) represents the set of relevant points of the ?lter φi [7]. Note the importance of having an adequate con?dence measure when working with ?lters which are selective to spatio-temporal orientations. 3.2. Estimation of a motion pattern In this section, the methodology to integrate the estimations corresponding to the set of ?lters which compose a motion pattern is described. Let Pk be the k-th motion pattern detected in the i=1,...Lk sequence, and let φk be the set of Lk grouped ?lters i in Pk . Let ?k be the set of estimations vi ? N (?vi , ?vi ) obi=1,...Lk tained from φk which are above the con?dence threshi old. The integration will be performed on the basis of a linear combination vk = αi vi (7)
vi ∈?k

grouped ?lters in each motion pattern is described. Finally, in section 3.3 the proposed multiple motion representation is de?ned.

3.1. Estimation of a spatio-temporal ?lter response To estimate the velocity vi at a given point (x, y, t) of the i-th ?lter φi , an analysis similar to the probabilistic approach proposed in [9] is used. Thus, and using the odd response of the ?lter, the velocity vi at a given point (x, y, t) is de?ned on the basis of a Gaussian random variable vi with mean ?vi and covariance ?vi : vi ? N (?vi , ?vi ) where ?vi and ?vi are calculated as

i = 1, . . . N


with vk representing the velocity at the point (x, y, t) of the motion pattern Pk , and αi given by the equation (2) αi = ?vi /λvi λvj vj ∈?k ?vj (8)

?vi = ??vi ·

wr dr r γ1 fe 2 + γ2

Original A

Motion Patterns A

Optical Flow B


Fig. 2. Results with synthetic sequences. In this equation, the norm ?vi measures the “amount of motion” detected at this point by the ?lter φi , while λvi measures the reliability of the estimation vi (equation (5)). The denominator in (8) guarantees that ?k αi = 1. If we assume that vi are independent variables, vk will be a random variable with a Gaussian distribution with mean ?vk = 2 ?k αi ?vi and covariance ?vk = ?k αi ?vi . 3.3. Multiple velocities representation. The motion patterns allow to separate the relevant motions presented in a given sequence; therefore, they become an adequate tool to represent multiple velocities at the same location. Thus, our scheme will obtain the set of velocities v at a given point (x, y, t) directly from the set of estimations calculated for each motion pattern: v = {vk }k=1...K (9) where K is the number of motion patterns detected in the sequence, and vk is the optical ?ow estimation at the point (x, y, t) of the k-th motion pattern Pk . Note that due to the use of con?dence measures, we will not always have K estimations at each given point. 4. RESULTS In this section, the results obtained with real and synthetic sequences are showed to prove the performance of our model. 4.1. Synthetic sequences The ?gure 2 shows two synthetic sequences which have been generated with Gaussian noise of mean 1 and variance 0. In this case, we have used the values γ1 = 0, γ2 = 1 y γp = 1e ? 5 (with ??1 = λp I [9]) in equations (2) and (3). The spatial and p temporal partial derivatives have been calculated using the kernel 1 (?1, 8, 0, ?8, 1), the gradient constraints have been applied in 12 a local neighborhood of size 5 × 5, and the weight vector has been ?xed to (0.0625, 0.25, 0.375, 0.25, 0.0625) [10]. The ?rst example (?gure 2(A)) shows a sequence where a background pattern with velocity (-1,0) frames/image is occluded by a foreground pattern with velocity (1,0). The second example (?gure 2(B)) shows A (occlusion) 0.84? 3.93? 4.79? 2.66? 8.59? 10.47? 2.97? 3.96? B (transparency) 0.44? 7.76? 50.89? 52.77? 45.81? 47.78? 45.27? 57.86?

Proposed technique Nestares Lucas&Kanade Horn&Schunk Nagel Anandan Singh Uras

Table 1. Mean error comparison (techniques applied to the sequences in ?gure 2)

two motions with transparency: an opaque background pattern with velocity (1,0), and a transparent foreground patterns with velocity (-1,0). In both cases, the ?gure shows the central frame of the sequence, the motion patterns detected by the model (two in each case), and the optical ?ow estimated with our technique using multiple motions representation. Note that in the ?rst example our technique obtains two velocities at the occlusion points; in a similar way, in the second example our methodology is able to estimate two velocities for each point of the frame. Since we have access to the true motion ?led of the synthetic sequences, we can measure the performance of the proposed methodology. For this purpose, the following angular measure of error [10] between the correct velocity vc and an estimate ve will be used: e(vc , ve ) = arccos(vc , ve ) (10) where, given a velocity v = (vx , vy ), we calculate v as v = 2 2 (vx , vy , 1) vx + vy + 1 . Since our examples have points with two velocities, the error will be measured in relation to the nearest correct velocity at this point. Thus, if Ψ represents the set of correct velocities at the point (x, y, z), the measure of error will be given by the equation: E(ve ) = min {e(ve , vr ) , vr ∈ Ψ} (11)

Table 1 shows a comparison between our methodology and the seven techniques discussed in [10] (the mean error for the two examples in ?gure 2 is reported in each case) . As table 1 shows, the

Original A B

Optical Flow C




Fig. 3. Results with real sequences.

proposed method outperforms the other methods in all the cases (see in particular the example with transparency). 4.2. Real sequences Figure 3 shows three examples with real sequences. In this case, we have used the values λ1 = 0, λ2 = 1 and λp = 0.5 with the same partial derivatives and weight parameters used in the synthetic case. For each example, the ?gure shows the central frame of the sequence and the optical ?ow estimated with our technique (for real images sequences, we do not have the true motion ?led, so we can only show the computed ?ow ?eld). The ?rst example (?gure 3(A)) corresponds to a double motion without occlusions where two hands are clapping. The second one (?gure 3(B)) shows an example of occlusion where a hand is crossing over another one. In this case, where the occlusion is almost complete in some frames, the motion combines translation and rotation without a constant velocity. The third case shows an example of transparency where a bar is occluded by a transparent object (?gure 3(C)). In all the cases, our methodology separates the two motions presented in the sequence and it estimates two velocities in the occlusion points. 5. CONCLUSIONS In this paper, a new methodology for optical ?ow estimation has been presented. The proposed technique is able to represent multiple motions on the basis of a new frequency-domain approach capable to detect “motion patterns” (that is, a clustering of spatiotemporal ?lter responses with continuity in its motion). A methodology to obtain the optical ?ow corresponding to a spatio-temporal ?lter response has been proposed, using con?dence measures to ensure only reliable estimations. A probabilistic combination of velocities corresponding to the set of ?lters clustering in a given motion pattern has been proposed. The use of “motion patterns” has allowed to represent multiple motions, while the combination of estimations from different ?lters and the con?dence measures have reduced the initial aperture problem. The technique has been illustrated on several data sets. Real and synthetic sequences combining occlusions and transparency have been tested. In all the cases, the ?nal results enlightens the consistency of the proposed algorithm.

6. REFERENCES [1] B.K.P. Horn and B.G. Schunck, “Determining optical ?ow,” Arti?cial Intelligence, vol. 17, pp. 185–203, 1981. [2] D.J Heeger, “Model for the extraction of image ?ow,” Journal of the Optical Society of America A., vol. 4, no. 8, pp. 1455–1571, 1987. [3] O . Nestares and R . Navarro, “Probabilistic estimation of optical ?ow in multiple band-pass directional channels,” Image and Vision Computing, vol. 19, no. 6, pp. 339–351, 2001. [4] B.G. Schunck, “Image ?ow segmentation and estimation by constraint line clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 10, pp. 1010–1027, 1989. [5] H . Nagel and W. Enkelmann, “An investigation of smootness constraints for the estimation of displacement vector ?elds from image sequences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 565–593, 1986. [6] M.J. Black and P. Anandan, “The robust estimation of multiple motion: parametric and piecewisesmooth ?ow ?elds,” Computer Vision and Image Understanding, vol. 63, no. 1, pp. 75–104, 1996. [7] J. Chamorro-Martinez, J. Fdez-Valdivia, J.A. Garcia, and J. Martinez-Baena, “A frequency-domain approach for the extraction of motion patterns,” Proceedings of the ICASSP 2003, In press, 2003. [8] R. Rodriguez-Sanchez, J.A. Garcia, J. Fdez-Valdivia, and X.R. Fdez-Vidal, “The rgff representation model: A system for the automatically learned partitioning of ’visual patterns’ in digital images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 1044–1072, 1999. [9] E.P. Simoncelli, E.H. Adelson, and D.J. Heeger, “Probability distributions of optical ?ow,” IEEE Proceedings of CVPR’91, pp. 310–315, 1991. [10] J.L. Barron, D.J. Fleet, and S. Beauchemin, S, “Performance of optical ?ow techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994.

学霸百科 | 新词新语

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。