当前位置:首页 >> >>



Detection of moving objects in video using a robust motion similarity measure
Hieu T. Nguyen, Marcel Worring, and Anuj Dev
Abstract — This correspondence deals with the segmentation of a video clip into independently moving visual objects. This is an important step in structuring video data for storage in digital libraries. The method follows a bottom-up approach. The major contribution is a new well founded measure for motion similarity leading to a robust method for merging regions. The improvements with respect to existing methods have been con?rmed by experimental results. Keywords— motion similarity, motion segmentation, content-based indexing, video analysis

I. I NTRODUCTION Video is becoming an important datatype in digital libraries. Besides the traditional verbal user queries, the library should support queries based on object shape and motion. Objects and their characteristics therefore form basic units for content-based retrieval and presentation of video. They are further useful for video compression and the creation of hypervideo documents. Our correspondence deals with motion segmentation, i.e. the decomposition of a video scene into independently moving visual objects. Starting from an oversegmentation of the scene, it merges regions based on motion similarity. The paper is structured as follows. In Section II we formulate the problem and provide a short review of existing literature. Our new region merging method is described in Section III. Section IV shows the results of applying the method on some standard test image sequences. II. P ROBLEM

Suppose there are K moving rigid objects in the scene, we want to recover their regions of projection C1 , ..., CK on the image plane. When a surface of a rigid moving object is planar or distant enough from the camera, its optic ?ow at position x = (x, y) is well modeled by the quadratic transformation [1]: vx vy = = a1 + a2 x + a3 y + a7 xy + a8 x2 a4 + a5 x + a6 y + a7 y2 + a8 xy


where ? = (a1 , . . . , a8) are the motion parameters of the moving surface. In existing literature the modelling of the optic ?ow of a region by a parametric model is used either in the direct form (1) or in combination with the intensity matching equation. In both cases we can consider it in the generalized form: y(x) = f(x; ?k ) + εx ?x ∈ Ck ; k = 1, ..., K (2)

priori and is to be estimated together with Ck . The term εx represents the measurement error. The problem can then be viewed as segmenting the data, which are described by multiple regression models, into groups so that data in each group is described by a common model. The challenge is that both Ck and ?k are not known a priori. First, a granularity level for regions has to be selected. Pixelwise classi?cation is inherently not reliable due to the aperture problem as pixels in homogeneous regions can be assigned to any model. Trying to overcome this problem recent methods start from homogeneous regions, usually obtained through intensity or color segmentation. Then regions are merged according to motion based criteria. Several criteria have been proposed [1], [8], [5], [2]. The earliest method [8] used a scaled Euclidean distance. This measure has been criticized to be sensitive to inaccuracy in parameter estimation as well as the choice of the scaling matrix [5], [2]. Furthermore, we have shown that the merging results depend on the choice of the origin of the coordinate system [6]. In [9] the merging decision is based on whether residuals in two regions can be expressed by one Gaussian distribution. We found that the measure tests for the difference in motion parameters of the regions as well as difference in noise variance. The latter test is, in fact, not needed. Altunbasak et al [2] proposed a merging procedure minimizing the residual over all regions. The global minimum corresponds to the maximum likelihood solution for both region labels and motion parameters. The problem is that the objective function usually has very many minima and the proposed technique ?nds a local minimum only. A good starting point is therefore required. An elaborate method was developed by Moscheni et al [5]. However, an asymmetric similarity measure is used. Moreover, the method depends on a large number of parameters that need to be set. In conclusion, existing merging criteria are still ad-hoc. To this end, we propose a new rigorous approach for de?nition of motion similarity and develop a new merging method utilizing the strengths of the methods of [9] and [2] and at the same time overcoming their shortcomings. III. N EW

A. Motion statistics based region similarity Let R1, ..., RM be the regions obtained from the initial oversegmentation. We assume that the initial segmentation is such that there are no regions occluding the object boundaries, hence, each Ri is a subregion of one Ck . Let θ i be the vector of motion parameters of Ri, then θi ∈ {?1 , ..., ?K }. Obviously, two regions undergo a common motion and therefore are supposed to be merged if and only if θ i = θj . In practice, however, we have ? i of θ i which is commonly to base our decision on an estimate θ

where y(x) is the measurement at pixel x. In (1) y would correspond to the vector (vx , vy ). f is a known function. ?k is the vector of motion parameters of region Ck which is unknown a
The authors are with the Intelligent Sensory Information Systems group, University of Amsterdam, Faculty of WINS, Kruislaan 403, NL-1098 SJ, Amsterdam, The Netherlands. Email: { tat,worring,anuj }@wins.uva.nl


obtained via least squares minimization: ? i = arg min Si (θ ) where Si (θ ) = θ |y(x) ? f(x; θ)|2 θ x∈Ri (3) In the presented method we ?rst apply the robust estimation technique given in [4] to detect outliers de?ned as pixels whose measurement is not described by the same model as the majority of pixels in the region. Then the parameters are reestimated for the clean data using least-square. Although the in?uence of the ? may be different from the true θ due to outliers is reduced, θ noise. As a consequence, two regions of the same object may turn out to have different estimated motion parameters. Therefore, inference about the equality of θ i and θ j should be made by means of a statistical test. We need to test the hypothesis : H0 : θ i = θ j versus H1 : θ i = θ j (4)

where ?i = min Si (θ ) S θ and ?ij = min[Si (θ ) + Sj (θ)] S θ

Hence, ? is proportional to the increase in the residual sum of squares when a common model is chosen for the two regions instead of the optimal model for each one. This statistic yields our measure for similarity between motions of two regions Ri and Rj . As an aside if we test the hypothesis: H0 : θi = θ j AND σi = σj versus H1 : θ i = θj OR σi = σj we then get the ?ij /nij ) ? ni log(S ?i /ni ) ? statistic used in [9]: Λ = nij log(S ? nj log(Sj /nj ). The drawback of Λ is that it is non-zero even if the motion parameters estimated in the two regions are equal. For simplicity we now focus on the case where f is linear with respect to θ . If not, all results below are applicable if Taylor’s approximations of f are made. We have shown that [6]: ?(i, j ) = = 1 ? ?j ) Hi (Hi + Hj )?1 Hj (θ ?i ? θ ?j ) (θ i ? θ 2σ2 1 ? ?j ) [H?1 + H?1 ]?1(θ ?i ? θ ? j ) (9) (θ i ? θ i j 2σ2

The problem is much like testing whether two sets of data
y θ1 θ2

where Hi is the hessian matrix of Si (θ ), which is constant if f is linear. This matrix is a by-product of minimization of Si (θ ). 1 It is important to note that 2σ 2 H? is the covariance matrix of i ? θ i. Moreover, when a merging decision is accepted, the optimal motion parameters and the hessian of the union Ri ∪ Rj can be obtained directly from those of Ri and Rj as follows [6]: Hij = Hi + Hj (10) We also want to assess how close the motion of Ri is to a hypothesized motion model with parameters ?. In this case ? is deterministic and its covariances are zero, eq. (9) then becomes: ?i , ?) = ?1 ( θ 1 ? ? i ? ?) (θ i ? ?) Hi (θ 2σ2 (11) ? ij = (Hi + Hj )?1 (Hi θ ? i + Hj θ ?j ) θ and




Fig. 1. Test if two sets of data points represent the same model.

points represent the same line ( see Figure 1). Let L be the likelihood function of the measurements in Ri and Rj , given θ i and θ j :
2 ?ni /2 2 ?nj /2 L(y(x), x ∈ Ri ∪ Rj |θi , θj ) = (2πσi ) (2πσj ) × 1 1 exp{?[ 2 Si (θ i ) + 2 Sj (θ j )]} 2σi 2σj (5) where σi and ni are the variance of the measurement errors and the number of pixels in Ri respectively. We construct the loglikelihood ratio statistic:

This expression coincides with the Mahalanobis distance be? i and the class center ?. tween θ B. Properties of the proposed measure
?1 1 ?1 1 It is important to note that 2σ + H? in (9), the 2 [(Hi j )] ? ? j , is the inverse of the covariance matrix of the vector θ i ? θ optimal scaling matrix. This contrasts the ad-hoc one used in [8]. The other desirable properties of ? are given in the following theorem, proofs of which can be found in [6]: Theorem 1: The similarity measure ?(i, j ) de?ned in (9) satis?es the following properties: 1. ?(i, j ) > 0; ?(i, i) = 0; ?(i, j ) = ?(j, i) 2. ? has a chi-squared distribution χ2 p (λ) with degree of freedom p equal to the number of the motion parameters, whose non-centrality parameter is:

?(i, j ) = ?2 log( sup L/ sup L) θ i =θ j θ i =θ j


If ? exceeds a given threshold T , H0 is rejected in favor of H1. It is easy to show that: Si (θ ) Sj (θ ) Si (θ ) Sj (θ) ? = min[ 2 + 2 ] ? min 2 ? min 2 σi σj σj θ θ σi θ (7)

Although in practice noise variance may show minor variation over the image, it is advantageous to assume it is constant over the whole image. That means σi = σj = σ. Then ?= 1 ? ?i ? S ?j ] [Sij ? S σ2 (8)


1 (θ i ? θj ) Hi (Hi + Hj )?1 Hj (θ i ? θj ) 4σ2


3. For the velocity model (1) ? is invariant to af?ne transformations of the coordinates in the image space.


The second property guarantees that ? well discriminates regions undergoing the same motion from those undergoing different motions. In the former case θ i = θj and, hence, λ = 0 and ? has a central chi-squared distribution. In the latter case λ > 0, the distribution of ? becomes non-central and the measure tends to be large as the chance it exceeds the threshold T is much higher. The third property guarantees the independence of the merging result from translation or rotation of the coordinate system which is not the case for some earlier methods [8]. The proposed measure is not a metric as required by many standard clustering methods. It satis?es the conditions of positivity and symmetry as shown above, but not the triangular inequality. Thus, ? is a good measure when an appropriate merging method is developed. C. Merging procedure Maximum likelihood approach for merging regions requires minimization of the residual sum of squares over all regions:

MERGING STAGE 2 ?i to the nearest cluster center ?z , using the 1. Assign each θ i distance measure ?1, de?ned in (11). 2. Update the cluster centers ?k so that the sum of differences between the center and the cluster members is minimized. As we showed in [6] the new ?k can be found by solving the following system of equations: (
zi =k

Hi )?k =
zi =k

?i Hi θ


S(z, ?1 , . . ., ?K ) =

Si (?zi ) =

(13) where z = (z1 , .., zM ) and zi ∈ {1, . . ., K } is the region label specifying the index of the global moving surface to which R i belongs. A similar approach was used in [2]. We now show that the above minimization can be viewed as a generalized version of standard K-means clustering and the algorithm used in [2] can be improved to save computations. Considering f is linear with respect to the motion parameters and applying Taylor’s theorem, it is easy to show that: ? i ? ? z ) H i (θ ? i ? ?z ) ?i + 1 (θ Si (?zi ) = S i i 2 (14)


|y(x) ? f(x; ?zi )|2

3. Repeat 1 and 2 until cluster membership is unchanged. Since the number of objects is speci?ed in the beginning, the value of variance of noise does not affect the ?nal result and we do not have to specify σ. The convergence in the second stage is guaranteed as the cost function always decreases. Actually, if starting from the same initial set of cluster centers, the second stage gives the same result as the two-step iterative procedure used in [2] does 1 . However, our algorithm requires much less ? i and Hi are already obtained computations, considering that θ from the least-squares minimization of Si . Note also, that the above algorithm merges non-adjacent regions as well, which is not the case for some methods [1], [9], [5]. IV. R ESULTS In Figures 2 and 3 we show the results of applying the proposed merging algorithm on standard test sequences. The initial segmentation was obtained with the morphological multiscale technique [7]. The results for Table Tennis and Flower Garden were obtained with optic ?ow matching. As measurements y(x) we used the dense optic ?ow ?eld, computed from two successive images using the hierarchical method in [3]. For the Calendar sequence we used the model, which is based on the linearized intensity matching equation [6]. The improvements due to the use of the new similarity measure are con?rmed by comparison with Figure 2c,d, in which we show the results of the existing methods in [9] and [2] applied for the same set of initial regions. More elaborate evaluation can be found in [6]. V. C ONCLUSION

?i is constant, the minimization of S boils down to miniSince S mization of S where: S (z, ?1 , .., ?K ) = = 1 2 σ

? i ? ?z ) Hi (θ ? i ? ?z ) (θ i i
i=1 M

2 i=1

?i , ?z ) ?1 ( θ i


This is, in fact, similar to the objective function in the K-means ? i and algorithm except that the Euclidean distance between θ ?zi is replaced with the Mahalanobis distance (11). Like traditional K-means, good initial cluster centers are required. The ad-hoc pixel-based technique used in [2] for deriving the initial set of global motions is very likely to miss the motion of foreground objects. We propose an algorithm encompassing two stages where the ?rst one performs hierarchical region merging and provides a good starting point for the generalized K-means iterative procedure in the second: MERGING STAGE 1 1. Specify K , the number of objects. Initialize each cluster with one region: Cm = {Rm }; m = 1, ..., M . 2. Merge the two adjacent clusters Ci and Cj with the smallest ? de?ned in (9). Repeat until the number of clusters is reduced to K .

We have proposed a new criterion for similarity of regions movement in a video scene based on a statistical test for equality of motion parameters. The uncertainty in parameter estimation is incorporated in an optimal way. Using this measure we have developed a new merging algorithm consisting of two stages. The agglomerative merging in the ?rst stage provides a good starting point for the second stage in which the regions are merged according to a K-means like algorithm. The improved performance over existing methods has been demonstrated on real sequences. As extracted objects and their motion parameters are accurate, they can be used for content-based video retrieval in digital libraries. R EFERENCES
[1] G. Adiv, “Determining 3d motion and structure from optical ?ows generated by several moving objects,” IEEE Trans. on PAMI, vol. 7, no. 4, pp. 384–401, 1985.
1 We note, by the way, that in [2, section 4.3] the mixed use of intensity matching and optic ?ow matching dose not guarantee convergence.






Fig. 2. a) One frame from Table Tennis and the initial segmentation, consisting of 23 regions; b) Merging result of the proposed method, K=4; c) Result of the method in [9]; d) Result of the method in [2].



Fig. 3. Merging result of the proposed method for a) Flower Garden, K=3; and b) Calendar, K=5. Object boundaries are marked by white lines.

[2] Y. Altunbasak, P.E. Eren, and A.M. Tekalp, “Region-based parametric motion segmentation using color information,” Graphical Models and Image Proc., vol. 60, no. 1, pp. 13–23, Jan. 1998. [3] J.R. Bergen, P. Anandan, K.J. Hanna, and R. Hingorani, “Hierarchical model-based motion estimation,” in Proc. 2nd European Conf. of Comp. Vision, 1992, pp. 237–252. [4] P.J. Huber, Robust statistic, John Wiley, New York, 1981. [5] F. Moscheni, S. Bhattacharjee, and M. Kunt, “Spatiotemporal segmentation based on region merging,” IEEE Trans. on PAMI, vol. 20, no. 9, pp. 897– 915, Sept. 1998. [6] H.T. Nguyen, M. Worring, and A. Dev, “Robust motion-based segmentation in video sequences,” Tech. Rep. 4, Intell. Sensory Inf. Sys. Group, Univ. of Amsterdam, Dec. 1998, available at http://carol.wins.uva.nl/?rein/isis.html. [7] P.Salembier, “Morphological multiscale segmentation for image coding,” Signal Processing, vol. 38, no. 3, pp. 359–386, Sept. 1994. [8] J.Y.A. Wang and E.H. Adelson, “Representing moving images with layers,” IEEE Trans. on Image Proc., vol. 3, no. 5, pp. 625–628, Sept. 1994. [9] L. Wu, J. Benois-Pineau, Ph. Delagnes, and D. Barba, “Spatio - temporal segmentation of image sequences for object-oriented low bit-rate image coding,” Signal Proc. : Image Comm., vol. 8, pp. 513–543, Sept. 1996.


In situ cell death detection kit
In situ cell death detection kit_基础医学_医药卫生_专业资料。细胞凋亡原位检测试剂盒 In situ cell death detection kit (Roche,cat no:11684817910) 注意:1....
笔记《Abnormal Crowd Behavior Detection using Socia...
笔记《Abnormal Crowd Behavior Detection using Social Force Model》_工学_高等教育_教育专区。笔记《Abnormal Crowd Behavior Detection using Social Force Model》 ...
社区发现Community Detection 算法_图文
社区发现(Community Detection)算法 社区发现(Community Detection)算法用来发现网络中的社区结构,也可以视为一种广 义的聚类算法。以下是我的一个 PPT 报告,分享给...
Anti-Factor Xa Activity Levels Detection (Anti-Xa A...
Anti-Factor Xa Activity Levels Detection (Anti-Xa Assay)_生物学_自然科学_专业资料。Anti-Factor Xa Activity Levels Detection (Anti-Xa Assay) 一, 血栓...
2100 detection error on hddo联想ThinkPad X240 维修...
2100 detection error on hddo联想ThinkPad X240 维修记_计算机硬件及网络_IT/计算机_专业资料。2100 detection error on hddo联想ThinkPad X240 维修记 ...
目标检测(Object Detection)原理与实现
目标检测(Object Detection)原理与实现_计算机软件及应用_IT/计算机_专业资料。目标检测的一篇文章目标检测(Object Detection)原理与实现基于阈值图像处理的目标检测 从...
【Key words】Energy detection; Adaptive detection; Detection delay 0 前言 信号检测技术在工程领域广泛应用。随着无线通信技术的不断发展,无线信号检测是很多 通信...
3. 基于距离动态的社区检测 (Community Detection Based On Distance Dynamics) 3.1 距离动态与用户定义的社区标准(Distance Dynamics versus User-defined Community ...
Object Detection with Discriminatively Trained Part...
Object Detection with Discriminatively Trained Part Based Models【中文译】【转】_机械/仪表_工程科技_专业资料。Object Detection with Discriminatively Trained Part...
A-Survey-of-Human-Face-Detection人脸检测方法研究大学毕业论文外文文献翻译及原文 - 毕业设计(论文) 外文文献翻译 文献、资料中文题目:人脸检测方...

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。