9512.net
甜梦文库
当前位置:首页 >> >>

A NEW IMAGE FEATURE FOR FAST DETECTION OF PEOPLE IN IMAGES



INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES
Volume 3, Number 3, Pages 383–391

c 2007 Institute for Scienti?c Computing and Information

A NEW IMAGE FEATURE FOR FAST DETECTION OF PEOPLE IN IMAGES
SON LAM PHUNG AND ABDESSELAM BOUZERDOUM Abstract. In this paper, we present a new method of detecting visual objects in digital images and video. The novelty of the proposed method is that it di?erentiates objects from non-objects using image edge characteristics. Our approach is based on a fast object detection method recently developed by Viola and Jones. While Viola and Jones use Harr-like features, we propose a new image feature called edge density that can be computed more e?ciently. When applied to the problem of detecting people and pedestrians in images, the new feature shows very good discriminative capability compared to Harr-like features. Key Words. people detection, image edge analysis, object detection, video surveillance, pattern recognition.

1. Introduction Detecting people and pedestrians in images and video has applications in video surveillance, road safety and many others. For example, Collins et al. [1] at CMU describe a multi-camera surveillance system that detects and tracks people over a wide area. Papageorgiou and Poggio [2] at MIT present a vision system that is used in Daimler-Chrysler Urban Tra?c Assistant to detect pedestrians. Haritaoglu and Flickner [3] at IBM develop an intelligent billboard that uses a camera to detect and count the number of people in front of the billboard. There are two major approaches to detecting people in images and video. The ?rst approach ?nds people using heuristic visual cues such as motion, background scene or color. Using motion, the di?erence between consecutive video frames is calculated to identify image regions that contain moving objects [4]. Using background scene, a model is built to describe the statistical properties such as color, intensity, spatial and temporal variations of background pixels [3, 5, 6]; comparing this model and a new video frame will determine if a pixel belongs to the foreground or the background. The ?rst approach can rapidly locate regions that likely contain people. However, these regions must be further processed using techniques such as face detection [4] or silhouette shape analysis [7]. Furthermore, this approach is of limited use when only a single input image is available. The second approach scans the image window-by-window, a window is a ?xedsize rectangular region of the image. Pattern classi?ers are trained to determine if each window resembles the human body. This approach is computation-intensive but it can cope well with image variations. Papageorgiou and Poggio [2] proposed a pedestrian detection method that extracts Harr wavelet features from each 128by-64 window and uses support vector machines to classify the features. Recently, Viola and Jones [8] developed a fast object detection method that relies on a cascade
Received by the editors October 3, 2006.
383

384

S.L. PHUNG AND A. BOUZERDOUM

Table 1. People detection methods. Author Papageorgiou and Poggio [2] Oliver et al. [9] Haritaoglu et al. [7] Branca et al. [10] Rachlin et al. [6] Pantil et al. [4] Yang et al. [11] Yoon and Kim [12] Zang and Kodagoda [13] Harasse and Bonnaud [5] Year 1999 2000 2000 2002 2003 2004 2004 2004 2005 2006 Is based on Harr wavelets support vector machines eigen model of the background image model of background image classi?cation of shape features motion, Harr wavelets, 3-layer neural net color segmentation motion, face detection depth, motion, color skin color, background subtraction, Hausdor?-based shape comparison motion detection with laser range ?nder edge-based template matching statistical background model, skin color, human model of head, skin and body regions

of classi?ers. Each classi?er uses one or more Harr-like features and is trained using an adaptive boosting algorithm. Viola and Jones’ method has been applied successfully to the face detection problem. A list of people detection methods is shown in Table 1. This paper presents an object detection method that relies on object edge characteristics to di?erentiate objects and non-objects. We propose a new image feature called edge density that can be computed very fast, and apply it to detect people and pedestrians in images. This paper is organized as follows. Section 2 describes the proposed object detection method and the image feature. Section 3 focuses on an application of the proposed method in people detection and analyzes the discriminative power of the edge density feature. Section 4 is the conclusion. 2. Edge Density Approach Our method is based on an object detection method that is proposed by Viola and Jones [8]. For a given input image, object regions are detected by scanning exhaustively windows of the image. Because there could be over 200, 000 windows in a typical image of size 640 × 480 pixels, a fast classi?cation method is required to support real-time detection. Each window is processed by a cascade of strong classi?ers to determine if it is an object or a non-object. If a strong classi?er considers the window as a non-object, the window is immediately rejected; otherwise, the window is processed by the next strong classi?er in the cascade. This means an object window must be processed by all strong classi?ers, whereas a non-object window will be processed typically by a small number of strong classi?ers. Because the majority of windows in an input image are non-object, the cascade structure reduces the average processing time per window. A strong classi?er is made up from one or more weak classi?ers, and each weak classi?er uses exactly one image feature extracted from the window. A strong classi?er is so called because it has a lower error rate compared to a weak classi?er. A strong classi?er can be built from several weak classi?ers using the AdaBoost algorithm [14]. The key idea of this algorithm is to force each weak classi?er to focus on the training samples that the previous weak classi?ers fail to process.

A NEW IMAGE FEATURE FOR FAST DETECTION OF PEOPLE IN IMAGES

385

2.1. New Image Feature based on Edge Density. The system by Viola and Jones uses Harr-like feature that is de?ned as the di?erence in the pixel sums of two adjacent regions. If a Harr-like feature is greater than a threshold, the weak classi?er considers the window as an object. Essentially, a salient Harr-like feature indicates a window as an object if region A appears signi?cantly darker or brighter than region B, where regions A and B are to be found through training. This strategy works well for objects with a de?ned inner structure such as the human face. For example, it is a known fact that the eye region has a di?erent brightness compared to its surrounding. However, for some objects such as the human body (in standing or walking pose) the dominant visual characteristics are the outer shape and edges. This observation motivates us to develop a new image feature that is based on edge density.

Figure 1. Left : an image window. Middle : the edge magnitude. Right : three edge density features where each feature is the average edge magnitude in a speci?c subregion. For a given window, an edge density feature measures the average edge magnitudes in a subregion of the window (see Fig. 1). Let i(x, y ) be a window and e(x, y ) be the edge magnitude of the window. For a subregion r with the left-top corner at (x1 , y1 ) and the right-bottom corner at (x2 , y2 ), the edge density feature is de?ned as (1) f= 1 ar
x2 y2

e(x, y )
x=x1 y =y1

where ar is the region area, ar = (x2 ? x1 + 1)(y2 ? y1 + 1). If the edge density feature is greater (or smaller) than a threshold, the weak classi?er considers the window as an object. This is equivalent to saying that a strong (or weak) presence of image edges in a subregion will determine if the window is an object. In a window, there will be several thousands of subregions or features. The objective of system training is to identify the most salient subregions. For the task of window scanning, there is a very e?cient method to compute edge density features. Let I = {I (x, y )} be the input image of size H × W . Let

386

S.L. PHUNG AND A. BOUZERDOUM

E = {E (x, y )} be its edge magnitude; E (x, y ) is found by applying edge operators such as Sobel or Prewitt on the entire image [15]. The edge magnitude is a combination of the edge strengths along the horizontal and vertical directions: (2) E (x, y ) =
2 (x, y ) + E 2 (x, y ) Eh v

From the edge magnitude image E, we compute an edge integral image S. The pixel value a location (x, y ) of S is de?ned as
x y

(3)

S (x, y ) =
x =1 y =1

E (x , y )

That is, S (x, y ) is the sum of edge magnitudes in the rectangular region {(1, 1) ? (x, y )}. Given the edge integral image, the edge density feature of a subregion r = {(x1 , y1 ), (x2 , y2 )} can be computed using only a few arithmetic operations: (4) f = 1 {S (x2 , y2 ) + S (x1 ? 1, y1 ? 1) ? S (x2 , y1 ? 1) ? S (x1 ? 1, y2 )} ar

Our approach requires computation of the edge magnitude image E before scanning occurs. Subsequently, each edge density feature involves only one subregion whereas each Harr-like feature involves at least two subregions. Hence, if the same number of features is used, the proposed approach can be expected to run faster compared to the Viola and Jones’ system. In Section 3, we shall study the classi?cation performance of the new image feature. 2.2. Selecting the Most Salient Feature. A weak classi?er is built by selecting the best feature from a feature pool of several thousands. This section describes the feature selection technique. + + + In a given training set, let w1 , w2 ,...,wM be the weights of M training object ? ? ? patterns (i.e. positive patterns). Let w1 , w2 ,...,wN be the weights of N training non-object patterns (i.e. negative patterns). Let w+ be the sum of all weights for M + object patterns, w+ = i=1 wi . Let w? be the sum of all weights for non-object N ? ? patterns, w = i=1 wi . During training, we can modify individual weights but the sum of w+ and w? must be kept to 1. Given an edge density feature f that corresponds to a subregion r, we ?rst compute the cumulative histograms c+ (θ) and c? (θ) for the object and non-object patterns, taking into account pattern weights. There are two possible decision rules: (1) object if f > θ, and non-object otherwise; (2) object if f ≤ θ, and non-object otherwise. Here, θ is a threshold value. The error rate for the ?rst decision rule is (5) e1 (θ) = w? + c+ (θ) ? c? (θ)

The error rate for the second decision rule is (6) e2 (θ) = w+ ? c+ (θ) + c? (θ)

Note that the sum of e1 (θ) and e2 (θ) is equal to 1. Among the two decision rules, we select the one that gives a smaller error, e(θ) = min[e1 (θ), e2 (θ)]. The error rate using feature f is the minimum value of e(θ) across the range of θ. Finally, from the feature pool we choose the feature that gives the minimum error.

A NEW IMAGE FEATURE FOR FAST DETECTION OF PEOPLE IN IMAGES

387

3. Experiments and Analysis Having described the object detection approach, we now apply it to the problem of detecting people and pedestrians in images. The aim of this section is to study the process of building weak and strong classi?ers, and the classi?cation performance of the edge density feature. 3.1. Experiment Data. We collected a total of 622 images that contain people and pedestrians, and manually identi?ed the coordinates of the people in these images. The images contain 817 people patterns. There are strong variations in the patterns: frontal view, side view, people in standing, bending, walking and running poses. Geometric transformations (image ?ipping and shifting) were applied to generate 2000 people patterns, of which 1000 patterns were used for training and 1000 patterns were used for testing. We also extracted 2000 non-people patterns from a set of landscape images, half of the non-people patterns were used for training and the other half for testing. Examples of the people and non-people patterns are shown in Fig. 2.

Figure 2. Examples of people and non-people patterns. The average aspect ratio (height/width) of the people patterns in our dataset is 2.86 : 1. Note that this aspect ratio covers children as well as people in running or striding pose. Based on this result, we selected a window size of 46 × 16 pixels for designing the classi?ers. This window size is found to reduce the computation load while keeping su?cient visual details for classi?cation. 3.2. Analysis of Edge Density Features. A strong classi?er is trained in several rounds. In each round a weak classi?er using exactly one edge density feature is formed. The weights of training patterns are modi?ed according to the AdaBoost algorithm [14] to put emphasis on the patterns that the previous weak classi?er incorrectly handles. We trained a strong classi?er for 50 iterations. The edge operator used is the di?erence operator. Figure 3a shows the error rates of the strong classi?er and weak classi?ers as training progresses. The results show that the training error of the strong classi?er decreases steadily with respect to the number of the training rounds. However, the error rates of individual weak classi?ers ?uctuate with an

388

S.L. PHUNG AND A. BOUZERDOUM

0.8 0.7 Error Rate on Training Set 0.6 0.5 0.4 0.3 0.2 0.1 0 Weak Classifier Strong Classifier Error Rate on Test Set

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Weak Classifier Strong Classifier

5

10

15

20 25 30 35 Boosting Round t

40

45

50

5

10

15

20 25 30 35 Boosting Round t

40

45

50

(a)

(b)

Figure 3. Error rates of weak classi?ers and a strong classi?er on a) the training set, and b) the test set.

Figure 4. Examples of selected edge density features: at boosting round 1, 3, 5, 7, 13 and 23.

upward trend. This trend is explained by the fact that each new weak classi?er in essence focuses on a small subset of the training set; this subset contains ”di?cult” patterns that previous weak classi?ers cannot handle. After 30 training rounds, the strong classi?er has an error rate of 0.079. Some edge density features selected by the strong classi?er are shown in Fig. 4. These features indicate that the strong classi?er mostly picks up the edge di?erence between the human body and the surrounding. The feature selected at round 7 re?ects the fact that there are strong edges in the human head region. The performances of the strong classi?er and individual weak classi?ers on the test set are shown in Fig. 3b. The results show that even though the error rate of each weak classi?er is high, the error rate of the strong classi?er decreases steadily. In this case, there is little change in the error rate of the strong classi?er after round 10. Using a validation set, we can detect when this occurs and stop training the strong classi?er. At this point, we usually collect more data for training the next strong classi?er and add it to the cascade.

A NEW IMAGE FEATURE FOR FAST DETECTION OF PEOPLE IN IMAGES

389

Using the threshold found by the AdaBoost algorithm, the strong classi?er with 10 features has an error rate of 0.2035, a false positive rate of 0.1570, and a false negative rate of 0.2500. The strong classi?er with 50 features has an error rate of 0.1740, a false positive rate of 0.1420, and a false negative rate of 0.2060. By reducing the threshold of a strong classi?er, we can reduce its false negative rate to, say, Fn = 0.0001 at the cost of an increased false positive rate to Fp . If we put n strong classi?ers in series, the (expected) overall false negative rate will n become 1 ? (1 ? Fn )n whereas the overall false positive rate is Fp . Clearly, a large n will give both a low false negative rate and a low false positive rate. 3.3. Comparison of Edge Operators. In this section, we compare the performance of di?erent edge operators. Three edge operators were examined: the di?erence operator, the Sobel operator and the Prewitt operator. The convolution masks of these operators are shown in Table 2. The edge strengths along the horizontal and vertical directions are computed as (7) Eh = I ? hh and Ev = I ? hv Table 2. Edge operators used for feature extraction. Operator Di?erence Sobel Horizontal Mask hh 1 ? 1 ? ? 1 1 1 ?0 0 0? ??1 ?1 ?1? 1 2 1 ?0 0 0? ?1 ?2 ?1 Vertical Mask hv ? 1 ?1 ?1 1 ?2 1 1 0 0 0 0 0 0 ?1 ? ?1 ?1? ?1? ?1 ?2? ?1

Prewitt

0.4 0.35 Error Rate on Training Set 0.3 0.25 0.2 0.15 0.1 0.05 0 Difference operator Sobel operator Prewitt operator Error Rate on Test Set

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Difference operator Sobel operator Prewitt operator

5

10

15

20 25 30 35 Boosting Round t

40

45

50

5

10

15

20 25 30 35 Boosting Round t

40

45

50

(a)

(b)

Figure 5. Error rates of strong classi?ers using three edge operators on a) the training set, b) the test set. The performances of the two strong classi?ers that use di?erent edge operators to extract the edge density features are shown in Fig. 5. This ?gure shows that compared with the other operators, the di?erence operator leads to faster training and a lower error rate on the test set. After 50 training rounds, the error rates on the test set for the di?erence, Sobel and Prewitt operators are 0.1740, 0.2415 and

390

S.L. PHUNG AND A. BOUZERDOUM

0.2290, respectively. Note that the di?erence operator has a smaller size and hence can be applied very fast on the entire input image. 3.4. Comparison of Edge Density and Harr-like Features. For comparison purposes, we trained two strong classi?ers: one using only edge density features, and the other using only Harr-like features [8]. The two types of image features are illustrated in Fig. 6. A Harr-like feature is the di?erence in the intensity sums of two adjacent rectangles. In comparison, an edge density feature is the average edge magnitude in a region.
Harr-like feature Edge density feature

grey region - white region

average edge magnitude

Figure 6. Harr-like features and edge density features.

0.4 0.35 Error Rate on Training Set 0.3 0.25 0.2 0.15 0.1 0.05 0 Harr feature Edge feature Error Rate on Test Set

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Harr feature Edge feature

5

10

15

20 25 30 35 Boosting Round t

40

45

50

5

10

15

20 25 30 35 Boosting Round t

40

45

50

(a)

(b)

Figure 7. Error rates of strong classi?ers that use Harr-like features and edge density features on (a) the training set, (b) the test set. The performances of the two strong classi?ers on the training set and the test set are shown in Fig. 7. The ?gure shows that the training error reduces faster using edge intensity features. For example, after 10 rounds the training error is 0.1135 for edge density feature, and 0.2085 for Harr feature. Furthermore, the test error is lower for the strong classi?er that uses edge density features. After 50 training rounds the best test error is 0.1605 for edge density feature, and 0.2230 for Harr feature. The above results for the people detection task demonstrate a clear improvement of the proposed image feature. We plan to study next the comparative performance of the full people detector and extend our approach to directional image features.

A NEW IMAGE FEATURE FOR FAST DETECTION OF PEOPLE IN IMAGES

391

4. Conclusion A new method for detecting objects in images that relies on object edge characteristics is presented. We propose a new image feature called edge density that can be computed very e?ciently. The edge density feature is found to have better discriminative capability compared to the Harr-like feature for the task of detecting people in images. The di?erence operator is found to outperform the Sobel and Prewitt operators in terms of speed and classi?cation accuracy. Acknowledgments The authors thank Mr Markos Pratsas for taking part in data collection. This work is supported by the University of Wollongong Small Research Grant. References
[1] R.T. Collins, A.J. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms for cooperative multisensor surveillance,” Proceedings of the IEEE, vol. 89, no. 10, pp. 1456–1477, 2001. [2] C. Papageorgiou and T. Poggio, “Trainable pedestrian detection,” in International Conference on Image Processing, 1999, vol. 4, pp. 35–39 vol.4. [3] I. Haritaoglu and M. Flickner, “Attentive billboards,” in International Conference on Image Analysis and Processing, 2001, pp. 162–167. [4] R. Patil, P.E. Rybski, T. Kanade, and M.M. Veloso, “People detection and tracking in high resolution panoramic video mosaic,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004, vol. 2, pp. 1323–1328 vol.2. [5] S. Harasse, L. Bonnaud, and M. Desvignes, “Human model for people detection in dynamic scenes,” in International Conference on Pattern Recognition, 2006, vol. 1, pp. 335–354. [6] Y. Rachlin, J. Dolan, and P. Khosla, “Learning to detect partially labeled people,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003, vol. 2, pp. 1536–1541. [7] I. Haritaoglu, D. Harwood, and L.S. Davis, “W4: real-time surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809–830, 2000. [8] P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2004. [9] Nuria M. Oliver, Barbara Rosario, and Alex P. Pentland, “A bayesian computer vision system for modeling human interactions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 831–843, 2000. [10] A. Branca, M. Leo, G. Attolico, and A. Distante, “People detection in dynamic images,” in International Joint Conference on Neural Networks, 2002, vol. 3, pp. 2428–2432. [11] Mau-Tsuen Yang, Ya-Chun Shih, and Shih-Chun Wang, “People tracking by integrating multiple features,” in International Conference on Pattern Recognition, 2004, vol. 4, pp. 929–932. [12] Sang Min Yoon and Hyunwoo Kim, “Real-time multiple people detection using skin color, motion and appearance information,” in 13th IEEE International Workshop on Robot and Human Interactive Communication, 2004, pp. 331–334. [13] Zhengzhi Zhang and K.R.S. Kodagoda, “Multi-sensor approach for people detection,” in International Conference on Intelligent Sensors, Sensor Networks and Information Processing, 2005, pp. 355–360. [14] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1995. [15] Rafael C. Gonzalez and Richard E. Woods, Digital image processing, Prentice Hall, New York, 2002. School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, North?elds Av, Wollongong, NSW 2522, Australia. E-mail : phung@uow.edu.au and a.bouzerdoum@ieee.org URL: http://www.elec.uow.edu.au/staff/sphung/



更多相关文章:
新视野大学英语3第二版网络学习答案(有五个单元)
A. She will run as fast as she can. B. She will defend herself. C....2. On whom did people play tricks after Pope Gregory introduced a new ...
张露露(外语翻译)
(4) Vertical cycle Product features : 1) covers...Simple, fast, easy to use, safe, reliable, ...Detection of long vehicles, vehicle parking is ...
传感器现代测试技术考试重点整理必考
(noise elimination, feature extraction, signal ...特点:a short response time;fast detection;Simple ...image, in other words, Infrared thermal imaging ...
路面病害修补图像的车载自动检测方法zxh
First, the pavement patch image feature is ...automatic detection 0 引言高速公路路面的平整度和...Fast restoration approach for rotational motion ...
Image Edge Detection Based on Improved PCNN
Image Edge Detection Based on Improved PCNN_信息...INTRODUCTION Edge as the most basic feature of ... A fast subpixel edge d... 7页 1下载券 亚...
caffe实例官微复制
tune the ImageNet-trained CaffeNet on new data....R-CNN detection Run a pretrained model as a ... Convolutional Architecture for Fast Feature ...
视频监控外文翻译
Detection of moving objects in video streams is ...feature data, but is extremely sensitive to ...This hybrid algorithm is very fast, and ...
文献选读任务书_图文
approach to saliency detection in images.; 2008....Deep sparse multi-task learning for feature ...Fast saliency-aware multi-modality image fusion. ...
fastJSON使用
import com.alibaba.fastjson.serializer.SerializerFeature; private static final SerializeConfig config; static { config = new SerializeConfig(); config.put(...
计算机自动化毕业论文外文翻译
With the fast computers and signal processors ...Detection of edges in an image is a very ...Edges consist of meaningful features and contained ...
更多相关标签:

All rights reserved Powered by 甜梦文库 9512.net

copyright ©right 2010-2021。
甜梦文库内容来自网络,如有侵犯请联系客服。zhit325@126.com|网站地图