Background Modelling Using Edge-Segment Distributions

We propose an edge-segment-based statistical background modelling algorithm to detect the moving edges for the detection of moving objects using a static camera. Traditional pixel intensity-based background modelling algorithms face difficulties in dynamic environments since they cannot handle sudden changes in illumination. They also bring out ghosts when a sudden change occurs in the scene. To cope with this issue, intensity and noise robust edge-based features have emerged. However, existing edge-pixel-based methods suffer from scattered moving edge pixels since they cannot utilize the shape. Moreover, traditional segment-based methods cannot handle edge shape variations and miss moving edges when they come close to the background edges. Unlike traditional approaches, our proposed method builds the background model from ordinary training frames that may contain moving objects. Furthermore, it does not leave any ghosts behind. Moreover, our method uses an automatic threshold for every background edge distribution for matching. This makes our approach robust to illumination change, camera movement and background motion. Experiments show that our method outperforms others and can detect moving edges efficiently despite the above mentioned difficulties.


Introduction
For the past few decades extensive investigations and analysis have been performed for the detection of moving objects since it has prevalent applications in a variety of disciplines.A vision-based traffic control system [1, 2], video surveillance [3,4], video segmentation [5,6,7], video coding [8], video indexing [9], human behaviour analysis [9,10] are to name a few.A simplest, but most popular, method for moving object detection is background subtraction.Where moving objects are obtained from the difference image made from the difference between the current frame and the background model.Existing approaches to background model initialization assume that a sequence free of motion is available prior to building the background model [3,11].In the case of a surveillance area, a busy street or a public place, it is very difficult to collect training background frames without any moving objects.A good background model should absorb all the changes in the background scene over time.Several feature-based approaches have been proposed to model the dynamic change in the background [12]- [15].However, we classify the background modelling methods, according to the type of feature being used, into two main groups: pixel-based methods and edge-based methods.
Existing pixel-based background modelling algorithms model every background pixel individually.Several techniques [16]- [21] exist to create a model of each pixel.In these methods, once the colour likelihood of the input frame is computed, the pixels that deviate from the background model are labelled as the foreground pixels.Modelling every background pixel is difficult since the intensity feature is prone to illumination change.Moreover, multiple colours can be observed at certain locations due to the repetitive object motion, shadows, noise or reflectance from other objects [15].This effect becomes worse in the outdoor environment due to weather conditions, reflectance, motion in the background (e.g., waving tree branches) and unintentional camera motion.To adapt to this changing environment, the background model needs to be updated in every frame with an adaptation rate.However, these methods cannot consider object's motion for the selection of optimal update rate; rather they set a common rate for updating every background pixel.Thus the pixel intensitybased moving object detection methods leave ghosts (especially when a sudden change occurs for slowly moving objects) behind them [22].Although some statistical techniques have been used to overcome the ghost effect, these techniques are susceptible to sudden illumination variations.Additionally, to segment the moving area, these methods need to set a threshold over the difference image.Choosing the optimum threshold value is application dependent and very difficult to achieve.Hence, pixel-based methods [13]- [15] suffer from the multi-modal distribution in dynamic environments and the sensitivity to illumination changes and noise.
On the other hand, edge-based methods rely on edges, a feature that is less sensitive to intensity changes and noise.This feature overcomes the limitations of the pixel intensity-based methods.Moreover, edge-based methods do not leave ghosts [1, 3,5].Edges, however, have position and shape variations as illustrated in Figure 1.Nevertheless, the use of edges allows us to design more expensive and robust algorithms under similar computational time as they work with fewer pixels.The accumulation of 100 edge frames shows the variation of the edge's movement.In (c), the building's edges are thin, which depict small movement, while the trees' thick edges justify high movement variations.
Hence, edge features are useful tools for modelling the environment under their limitations.Research has been carried out to detect moving objects using edge features.However, existing edge-based methods use edge differencing.Consequently, by treating every edge pixel individually, they suffer from random noise.Pixel by pixel matching of edge points is not suitable due to the higher computational cost.Additionally, edges extracted from each frame do not always show consistency within frames.Kim and Hwang detected moving objects from sequence images by using an edge differencing method [5].However, their method does not update the background which results in higher false alarm.Jain et al. [19] proposed a method that models the background based on a sub pixel edge map that represents the edge position and orientation using a mixture of Gaussian models.Their method has a high computational cost due to the use of increased number of Gaussians that requires update at every frame.Dailey et al. [15] compute the moving object with three consecutive frames without using any background.However, the method matches the exact pixels edge and thus cannot detect moving edge pixels in the presence of random noise or camera movement.Furthermore, the edge-segment-based approach introduced by Hossain et al. [3] uses initial motion free training frames for generating the background model.Moreover, their method uses a common global threshold for matching every background segment.Background edges show shape and size variation within frames.In addition, the variations for different edges are not the same.Without considering this variation from the environment, detectors' output cannot be reliable. We

Proposed Method
Our statistical model attempts to predict the edge's behaviour, i.e., its shape and position changes, and encodes it into a statistical map.Therefore, when a new edge comes to the scene, we test it against the previous observed edge's behaviour and determine whether it fits the previous edges or it is a new one.Moreover, we use an adaptive comparison framework for the edges (i.e., the threshold for the matching score, the search window and a voting scheme to distinguish between moving edges and background edges that share the same region) that increases the accuracy of the detection.Additionally, the statistical model allows us to suppress the contribution of the moving objects to the background model, leaving only the background edges' contribution in the model.In summary, the background modelling method is divided in five parts (as shown in (5) Finally, we extract the moving edges as outliers from the statistical model.Furthermore, we present an abstraction of the method in Figure 3.

Statistical Modelling
To estimate the edges' behaviour, first, we extract each image's edges using a Canny edge detector [23], and represent the extracted edges set from frame � with a binary edge map, � � � .Consequently, we construct the statistical map, SM, which is the set of all the background distributions, from a set of frames through where the frames range from the initial frame � � to the final frame � � , � � � is the binary edge map at frame �, � is a pixel's position from the edge map � � � , and ���� is a Gaussian function estimator defined by where � is the pixel's position that belongs to the neighbourhood ���� of the position �, and � is the width of the kernel.

Adaptive Threshold
The distributions differ from two points of view: accumulation and motion (as shown in Figure 3).The accumulation of edges, among frames, reveals their variation and frequency (i.e., rate of the edge's occurrence in consecutive frames).Moreover, the frequency indicates which distributions represent background and which represent foreground.Thereby, the background and the foreground have a distinctive frequency.For instance, the moving objects, that appear and disappear from the scene, create small peaks in the distribution; while the background edges have a high distribution.Consequently, we can remove the spurious distributions based on the edge's frequency.On the other hand, the different motion in the edges creates wider or narrower distributions, e.g., edges with a lot of movement create spread distributions, while edges with little movement create sharp distributions.The creation of ad hoc distributions for each edge allows us to define accurate search regions for the edge matching process, and adaptive thresholds for each edge according to its characteristics.Moreover, the accurate search regions improve the information modelled by using the Gaussians of colour and gradient magnitude, which enrich the background model further-see Section 2.3.We threshold the distributions from these two points of view: by using the accumulation to remove foreground, and by using the motion (through the standard deviation of each distribution) to improve the accuracy of the detection.

Accumulation threshold
To remove the distributions created by the moving objects, we assume that the moving objects will have an average speed � in pixels per frames.Moreover, we use the inverse of the speed that gives us the number of frames that an object stays in the scene.Hence, we proposed a threshold, to remove such distributions, defined by where ��� ������ is the maximum value from the kernel function and � is the moving objects minimum average speed.Note that we remove the distributions of moving objects, in the SM, if its accumulation values are smaller than threshold T; otherwise we consider these distributions as background.In our experiments, we assumed that the objects will present a minimum average speed of � � 2��, where � is the number of frames used for learning (we used � � 2��), that is, an object will not be stopped more than half of the total frames used for learning the model.

Motion threshold
To threshold the distribution according to their motion we need to compute the cutting point that represents a certain percentage of the distribution (given by ��).First, we thin each distribution using Multi-Directional Non-Maximum Suppression [24] (which is the application of the non-maxima suppression algorithm at several directions and the combination of the results) to extract the centre (maximum peak of) each distribution.Consequently, we can compute several moments of the distribution and approximate the cutting point of the distribution through slices of the distribution that are orthogonal to the centre of the distribution.First, we define the ratio of the probability of two given points by where � � and � � are points in the distribution with probability � � and � � , respectively, and � is the standard deviation of the distribution.This relation allows us to use the mean of the distribution, (� � at position � � � �), to define any point position (� � ) as the a function of the ratio of its probabilities, by Thus, we can define the quantization step � between two points of the distribution, by Moreover, we define the cutting point for a � percentage of the distribution (in our experiments we use about 95% of the distribution by using � � 2) by Thereby, this point is the pixel position from the mean of the distribution that defines where to prune the distribution.Furthermore, we use several points from each distribution (as samples from the orthogonal slice of each mean point) to refine the approximation of this cutting point (by averaging the resultant cutting points).Consequently, we create a map with regions that represents the background.Thus, to apply this threshold, we check the distance from the centre pixels in each distribution.If the distance of a pixel (from the pixels of the mean of the distribution) is larger than the cutting point, we remove that pixel from the distribution.

Gaussians of Colour and Gradient Magnitude
Furthermore, we add colour and gradient magnitude information to the regions that represent the background.
For each pixel in the distribution, we create a set of Gaussians that model the colour and gradient magnitude information in that area (as shown in Figure 3).This will increase the detection rate, as it avoids the overelimination of moving edges (as shown in Figure 4).Given that our main goal is to create an intensity-robust method, we cannot use directly the RGB colour space to extract the colour information.Instead, we use the HSV colour space, and use the hue (H) and saturation (S) components to model the colour.Hence, we build two Gaussians, namely, G H and G S , to model the colour of each pixel.Additionally, we model the gradient magnitude (GM) by another Gaussian, G GM .These Gaussians are defined by where � � is the mean and � � is the standard deviation of the Gaussian (such that � � ��, �, ���), which are defined by where � � � is the value of type � � ��, �, ���, of the pixel at frame �, and� is the number of frames.

Foreground Detection
We use the resultant distributions after the adaptive threshold operation and the set of Gaussians (of colour and gradient magnitude) as a background model to detect the moving objects in the scene.In order to detect the moving objects, first we obtain the edges in the incoming frames by using a Canny edge detector [23].Then, we compare the moving edges with the background model.Consequently, those edges that do not lie within a background distribution are considered moving objects.Moreover, we check the edges that are within a background distribution to test their colour and gradient magnitude information.We take a vote for each pixel on the edge-segment (within the background region) to consider the pixel as background or foreground by where � � is the vote of the pixel � for the component � � ��, �, ���, � � ��� is the mean value at the position �, � � ��� is the standard deviation value at the position �, ���� is the value of the different information component (H, S, or GM) for the respective pixel �, and � is a constant that defines the inclusion in the Gaussian for the matching.Then, if at least �% (in our experiments we use 95% of the voting) of the votes for all the components, i.e., H, S and GM, consider the edge (within the background region) as background, then we consider the whole edge as background, otherwise we detect it as a moving edge.

Experiments and Results
We test our method on different sequences including PETS 2001 [25] and I2R database [26].These sequences have a dynamic background that represents the environment for video surveillance applications.In these databases, the images have background motion, illumination change and noise.Our proposed method was able to detect almost all of the moving objects in the sequences.

Results
We evaluated the detection capabilities of our method against four other methods: Dailey et al. [15], Dewan et al. [27], Hossain et al. [3] and Kim and Hwang [5].We trained and tested the algorithms against the PETS 2001 data sets.Specifically, we used the sequences in Data Set 3 (Testing Camera 1) and Data Set 4 (Testing Camera 1).Moreover, we chose these sequences because of their challenging environments.In Figures 5(a) and (b), we have Data Set 3 that has illumination variation due to the movements of the cloud in the sunny environment.Figures 5(c) and (d) have over exposed image sequences with illumination variation and reflection from the windows, cars and other moving objects.The ground truth for the selected frames is shown on the second row of Figure 5.In both sequences people are walking around in the scene.Since managing the background model is challenging, both Dailey et al. [15] and Dewan et al. [27], as is shown in the third and fourth rows of Figure 5, detect moving objects without using any background.The former method uses the edge pixel-based approach while the later uses the edge-segment-based approach.In both approaches, two edge maps are extracted from the edge difference image from three consecutive frames.Finally, the moving edges are extracted by applying logical AND operation between them.However, these methods fail to detect slowly moving objects and thus cannot be used for real-time applications.The slowly moving clouds in Data Set 3 (DS3) and the moving objects (people and cars) in Data Set 4 (DS4) create illumination variation.In Figures 5(a) and (b), the clouds present a challenge as they change their shape slowly.Although Hossain et al.'s [3] method uses an edge-segment structure (see the fifth row of Figure 5), the method relies on initial motion free training frames for generating the background.Moreover, for background segment matching, they have used a fixed threshold for all the background segments.Selection of a lower threshold results in matching of edges with small movement variation.On the other hand, higher threshold increases false background segment matching.Thus, the method gives false alarms.On the other hand, Kim and Hwang [5] detect moving objects from the sequence images by using an edge differencing method.They compute current moving edges and temporary moving edges.Finally, moving edges are determined by applying logical or operation between them.Their method does not update the background model.Thus, the method cannot handle dynamic background and results in more false alarms.Their moving edge detection results are shown in the sixth row of Figure 5.The proposed method overcomes all these problems due to its advantages of the statistical background.Here we utilize movement statistics of every background edge-segment effectively.Moreover, the use of Gaussian colour and magnitude distributions in the proposed method helps to recover moving edges that fall over the background distributions.Additionally, these distributions allow us to absorb flickering noisy edges as background edges in the proposed method.This adaptive behaviour (towards the flickering edges) in the background model is due to the adaptive thresholds and the statistical distributions that model the possible edge positions.Furthermore, the proposed method recovers a detailed shape and a clear boundary of the foreground.Hence, the proposed method increases its detection capabilities.We show the moving object detection of our proposed method in the last row of Figure 5.

Quantitative Evaluation
To evaluate the performance of the proposed system quantitatively, we compared the detected moving edgesegments with the ground truth that was segmented by hand.The metrics used for performance evaluation are three: Recall, Precision and Similarity that are defined by Table 1 shows the Precision and Recall of the five different datasets tested for the proposed method.Here, the outdoor sequences DS3 and DS4 have illumination variation due to the movement of the cloud and for the underlying moving objects.The Bootstrap (BS), Airport (AP) and Shopping Mall (SM) sequences are indoor environments that have object reflection and background noise.These three indoor sequences are collected from the I2R dataset.A sample snapshot for the three indoor datasets, as well as their ground truth and the detection result in the proposed method, is given in Figure 6. Figure 7 demonstrates the Precision and Recall of the detected moving edges for the DS3 and DS4 datasets.Again in Figure 7, Precision, which measures the accuracy of detecting moving edges, is higher in the  proposed method.This is due to the statistical background model, i.e., flexibility in matching is given to those segments that have high movement information.Moreover, the existing pixel-based method lags by scattered moving edge pixels that are often mismatched which eventually lead to a lower Precision value.Figure 8 illustrates the Similarity performance over the same datasets with the proposed method in comparison with four other methods.It describes the overall edge detection accuracy as well as effectiveness.Due to the segment-based statistical nature of our proposed method, it overcomes the difficulties and gives superior performance in the outdoor scene.Thereby, the proposed method proved to be robust against dynamic illumination change in the environments.
We also observed similar performance for the indoor I2R dataset.Figure 9 illustrates the Similarity performance for the three indoor sequences.We compared the proposed method with the I2R data set [26] against other five methods.Li et al. [28] use a Bayesian framework for background subtraction (BBS).The mixture of Gaussians (MoG) [29] uses multiple weighted Gaussian distributions as a background model.The background neural network (BNN) [30] is a mixture of a probabilistic neural network and a winner-takes-all neural network.The SOBS [31] is a self-organizing approach through neural networks.Kim and Hwang present a fast object segmentation algorithm that uses edges to extract the moving objects in a video sequence.The proposed method performs, on average, 11% better than other methods.These sequences are particularly difficult due to the reflections and shadows cast on the indoor surfaces.Moreover, Figure 9 shows the superiority of our method in comparison to pixel-based methods.In general, as shown in the Figures 8 and 9, the proposed method outperforms all the existing methods since the method shows high value for both the Precision and Recall parameters.Thus the proposed method is stable and more reliable than other methods discussed in this paper.

Conclusion
We presented a statistical edge-segment-based method to model background and detect moving objects in dynamic environments.The proposed method builds statistical distributions for each edge-segment by using the unique information of each edge-segment to compare other edges-resulting in a robust adaptive verification process.Moreover, thanks to these features, we overcome the most common edge problems, such as shape and position changes.Furthermore, these mechanisms can be incorporated in other edge-based methods to extend their functionality and make them robust in dynamic environments.The proposed statistical map can be used to split foreground edges that merge with the background, increasing the detection accuracy.Additionally, the proposed method explores the edge domain, which has not been researched as much as the pixel domain, for object detection.We found promising results that can be used in several applications, including surveillance in dynamic backgrounds and content-based video encoding.

Figure 1 .
Figure 1.Edges have changes in shape and position.(a) A sample scene with waving trees, and (b) its extracted edges.(c)The accumulation of 100 edge frames shows the variation of the edge's movement.In (c), the building's edges are thin, which depict small movement, while the trees' thick edges justify high movement variations.

Figure 2 .
Figure 2. A flow diagram of the proposed method.

Figure 3 .
Figure 3.An abstraction of the proposed method.

Figure 2 )
: (1) First, we create the frame statistical model.It is a kernel-density distribution from the edge maps.(2) Then, the frame statistical distributions are accumulated using temporal information.(3) Next the accumulation is adaptively thresholded, allowing us to use non-ideal frames to learn the background.(4) Additionally, in each distribution region we create a set of Gaussians to model the colour and gradient magnitude information for the region.

Figure 4 .
Figure 4. Current edge-based models have a problem: they overeliminate moving edges.

Figure 7 .
Figure 7. Precision and Recall measure on PETS 2001 database �� is the total of true positives, �� is the total of false negatives, �� is the total of false positives and �� is the total of true negatives.These metrics indicate the total number of correct/incorrect and foreground/background edge pixel detection with respect to the ground truth.These metrics are commonly used in the region-based methods where they indicate the percentage of region matched, the same is true for pixel and edge-based methods.Recall gives the percentage of true positives detected.Precision gives the percentage of detected items that are true positives.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education, Science and Technology (MEST) (No. 2012-0005523) and by a grant from the Kyung Hee University in 2011 (KHU-20111209).6. References [1] D. J. Dailey, F. W. Cathey and S. Pumrin (2000) An algorithm to estimate mean traffic speed using uncalibrated cameras.IEEE Transactions on Intelligent Transportation Systems.vol. 1. pp.98-107.[2] L. Zhi-fang and Y. Zhisheng (2007) A real-time vision-based vehicle tracking and traffic surveillance.Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.SNPD 2007.vol. 1. pp. 174-179.July

Table 1 .
Precision and Recall of the proposed method