Vehicle recognition in acoustic sensor networks using multiple kernel sparse representation over learned dictionaries

Sparse representation–based classification and kernel methods have emerged as important methods for pattern recognition. In this work, we study the problem of vehicle recognition using acoustic sensor networks in real-world applications. To improve the recognition accuracy with noise sensor data collected from challenging sensing environments, we develop a new method, called multiple kernel sparse representation–based classification, for vehicle recognition. In the proposed multiple kernel sparse representation–based classification method, acoustic features of vehicles are extracted and mapped into a high-dimensional feature space using a kernel function, which combines multiple kernels to obtain linearly separable samples. To improve the recognition accuracy, we incorporate dictionary learning method K-singular value decomposition into the multiple kernel sparse representation–based classification framework. The vehicle recognition from acoustic sensor network is then formulated into an optimization problem. Our extensive experimental results demonstrate that the proposed multiple kernel sparse representation–based classification method with learned dictionaries outperforms other existing methods in the literature on vehicle recognition from complex acoustic sensor network datasets.


Introduction
Wireless sensor networks (WSNs) have gained much attention recently due to advances in micro-electromechanical systems (MEMS) technology which has facilitated the development of smart sensors. 1 WSNs consist of a large number of sensor nodes whose position need not be pre-determined. 2 The protocols and algorithms of WSNs have self-organization capabilities. At the same time, sensor nodes have the ability to carry out simple computations locally and transmit required data to nodes responsible for the fusion instead of sending raw data, which truly compress the power consumption of the networks. For the powerful selforganization and fault-tolerant capacity, a wide variety 1 of applications are being envisioned for sensor networks, including habitat monitoring, health monitoring, battlefield surveillance, and target tracking.
Detection and classification of objects moving through the sensor field is an important task in many field applications. 3 For vehicle recognition, early work focused on military vehicle detection for battlefield surveillance. Nowadays, vehicle detection has become an important task for traffic monitoring and management. Recognition of different vehicle types, such as cars, motorbikes, buses, or trucks, provides detailed traffic statistics and useful information about road utilization. Since many characteristics of the vehicle can be inferred from the sound it generates, 4 it is feasible to recognize the type of the moving vehicle in acoustic sensor networks. [4][5][6][7] However, how to improve the robustness of vehicle recognition algorithms from sensor networks within complex and noisy environments remains a critical and challenging issue in practice.
The problem of vehicle classification in acoustic sensor networks is essentially a pattern recognition problem. In recent years, various classification methods have been proposed in this field to adapt to different situations and improve the recognition rate, such as maximum likelihood (ML), k-nearest neighbor (k-NN), support vector machines (SVMs), 2,8 and decision tree (DT) 9 methods. Wright et al. 10 found that sparse representation-based classification (SRC) performed well on face recognition, especially in noisy and cluttered environments. Mei and Ling. 11 proposed a robust visual tracking and vehicle classification approach using sparse representation and demonstrated its effectiveness on a vehicle tracking and classification task using outdoor infrared (IR) video sequences. Kernel methods are effective in machine learning to solve real-world problems with nonlinear data structures. Gao et al. 12 applied kernel sparse representation-based classification (KSRC) to map the data into a high-dimensional space to image classification and face recognition and achieved state-of-the-art performance. However, the success of kernel methods often depends on the choice of an appropriate kernel and features. Specific kernel functions are proposed for particular applications, such as text documents categorizing 13 and computational biology. 14 So, it is critical to select the kernel function that fits the samples. Instead of selecting one specific kernel function, multiple kernel learning methods which learn the kernel from the samples by a linear combination of base kernels is more effective, especially for complex scenarios with different sources or modalities.
The sparse representation classification model above assumes a data sample can be represented as a sparse combination of several atoms from the pre-specified or non-adaptive dictionaries, which cannot represent a given class of signals efficiently. To address this issue, recent research has focused on designing dictionaries using learning methods. 15,16 Engan et al. 17 introduced the method of optimal directions (MOD) to find a dictionary and a sparse matrix which minimize the representation error. This method suffers from the relatively high complexity. To train a generic dictionary for sparse signal representation, Aharon et al. 18 developed the K-singular value decomposition (K-SVD) algorithm which updated the dictionary atom-by-atom in a simple process rather than using a matrix inversion.
In this article, we try to combine the dictionary learning method with the sparse representation to solve the multi-sensor vehicle classification problem, focusing on recognizing different types of vehicles. The dataset contains acoustic recordings observed at each individual sensor in a real-world experiment carried out at the city of Twenty-Nine Palms, CA, in November 2001. 2 The features of vehicles are extracted using Mel-frequency cepstral coefficients (MFCC), which has proven to be efficient in acoustic signal recognition. Chitra and Sumalatha 19 used MFCC to extract the sound features of the emergency vehicles and performed the classification and identification task using SVM. This approach achieved increased accuracy and reduced time delay for the emergency. Matthias and Rainer 20 presented a mobile sound classification system which extracted 13 MFCC from the data collected by the microphone and classified the sound using the neural networks to recognize sounds of emergency vehicles in road traffic.
In this work, we study whether this set of features can effectively be applied to vehicle recognition in transportation applications. Since the acoustic signals gathered in a real-world setting is inherently complex, it is difficult to choose the best kernel function. It is better to have a set of kernel functions and let the algorithm select the best subset of kernels. 21 Therefore, we propose multiple kernel sparse representation-based classification (MKSRC), which combines several possible kernels as the kernel function and optimizes the multiple kernel weights while training the KSRC to adapt to different cases. Meanwhile, in contrast to previous approaches on sparse representation in which the dictionary is fixed by the training samples, 10,12 in this article, we update the dictionary by K-SVD to adapt to complex scenes.
The major contribution of this article lies in the following aspects. First, we have developed a new classification algorithm based on MKSRC and successfully applied it to vehicle recognition from acoustic sensor networks. Second, we have developed a new and effective multi-kernel weight update scheme based on gradient descend, enabling our multi-kernel representation to fit different input source characteristics. This sourceadaptive representation scheme has demonstrated its unique advantages by our experiments. Third, we have proposed a K-SVD method for dictionary update instead of using fixed dictionary obtained from the training samples as in existing methods. This new method is able to handle the classification tasks within different and complex environments.
The remainder of this article is organized as follows. In section ''Framework of vehicle recognition,'' we present the framework of vehicle recognition. Section ''Sparse representation models'' explains the sparsity models and MKSRC. Our dictionary learning method is presented in section ''Dictionary learning methods.'' Experimental results and performance comparisons are provided in section ''Experimental results.'' Finally, conclusions and discussions on future work are provided in section ''Conclusion.'' Framework of vehicle recognition Figure 1 shows our vehicle recognition framework using MKSRC, which has the following major components: 1. Pre-processing. The raw acoustic signals of vehicles are gathered from an acoustic sensor network. This pre-processing step is important for noise reduction. We use the constant false alarm rate (CFAR) detection method. 2 After the CFAR detection, useful event series are converted to frames and the default frame increment is half length of the frame as shown in Figure 1. 2. Feature extraction. An appropriate feature extraction method is important for classification. We use MFCC acoustic features. Acoustic signals often change quickly over time. Compared with linear prediction cepstral coefficients (LPCC), 22 MFCC is more extensively used because of its robustness. 23 3. Dictionary learning and classification. We construct an over-complete dictionary based on the MFCC features and map it to a highdimensional feature space using the kernel function which consists of multiple kernels To establish the best possible representations for each member in this set with sparsity constraints, we update the dictionary columns by K-SVD in an iterative manner. 18 Then, the object recognition problem becomes determining the best sparse representation of the test sample using the learned dictionary which has the minimum representation error.

SRC
Sparse representation is a signal processing method to represent the main information of the signal using non-zero coefficients as few as possible. 10 For object recognition, our goal is to classify the test sample using labeled training data. Here, our central approach is to represent the test sample as a sparse linear combination of training samples. Suppose we have l classes of objects, and let D = ½D 1 , D 2 , :::, D l 2 R m 3 n (m\\n) be a set of n training samples in l classes, where D i = ½d i, 1 , d i, 2 , :::, d i, n i 2 R m 3 n i (i = 1, 2, . . . , l) contains m-dimensional features of training samples in the ith class. Suppose the test sample y 2 R m 3 1 can be sparsely represented by the over-complete dictionary D, we can obtain the sparse vector x by solving the following optimization problem where x k k 0 is the l 0 norm which stands for the number of the non-zero coefficients of the matrix. While equation (1) is a NP-hard combinational optimization, Candes 24 proved that l 0 norm can be substituted by l 1 norm in equation (2) as an approximate solution if the solution of equation (1) is sparse enough In fact, since real-world data are often noisy, it may not be possible to express the test sample exactly as a sparse combination of the training samples. 10 The model in equation (1) can be rewritten as where z 2 R m 3 1 is a noise term with bounded energy z k k 2 \e. Thus, the sparse presentation model can be modified as and generally, model (4) is transformed to the optimization problem about J (x) as follows where the first part is the residual and the parameter l.0 is a scalar regularization parameter that balances the sparsity of the solution and fidelity of the approximation to y. The solution of equation (5) is already available, and we use orthogonal matching pursuit (OMP) 25 to solve the sparse minimization problem.
After sparsely coding y on D via l 1 -norm minimization, we obtain the solution denoted byx r and the classification result can be obtained by computing the residuals for each class where d i (x r ) selects only the non-zero coefficients belonging to class i. We then classify y to the object class that minimizes the residual error.

The kernel method
To ensure an ideal classification performance, the vector D i in the dictionary should be uncorrelated. However, in practice, the underlying similarities between the extracted features of different vehicles may result in some correlation among D i and affect the final result. Similar to kernel methods in SVM, 26 our kernel method is used to deal with the linearly non-separable problems using nonlinear mapping. A kernel is called a Mercer kernel if it satisfies the Mercer's condition: continuous, symmetric, and positive semi-definite. 27 Suppose x and x 0 are two vectors in x and f is a nonlinear function realizing the mapping from the input space x to the feature space F. Usually, a Mercer kernel k can be expressed as where \ Á , Á . denotes dot product. It transforms the dot product calculation in the high-dimensional feature space to the kernel function in the input space avoiding curse of dimensionality. Then, we can focus on the kernel function instead of f. The linear kernel, polynomial kernels, and Gaussian radial basis function (RBF) kernels are commonly used in kernel function design.

KSRC
Note that kernel methods are effective for linearly nonseparable problems. In this section, we propose a new classification method, called KSRC, which is a kernelbased sparse representation. We can recognize vehicles by solving equation (5) in SRC, but in kernel methods, we should construct a Mercer kernel. To make the training samples separable, we assume that there exists a feature mapping function f which maps the test sample y and the dictionary D from the input space x to a high-dimensional kernel feature space F y ! f(y) In SRC, the test sample can be sparsely represented by the training samples in the input space x. Similarly, we arrive at a kernel sparse-based representation of the test sample in the kernel feature space Likewise, the optimization problem in equation (5) can be mapped in the high-dimensional space F as follows However, since the mapping function f is unknown, the solution of KSRC is based on the selected kernel function. When the dictionary D is fixed, equation (10) can be rewritten as where L(x) = 1 2 k(y, y)

MKSRC
In this article, to determine the best kernel for our classification task, we propose MKSRC which assumes that sample a can be mapped into several different feature spaces by nonlinear mapping functions f m (a), (m = 1, 2, . . . , p) with different weights a m to achieve the best performance in classification. The corresponding kernel function for samples a and b is defined as where k m (a, b) = f m (a) T f k (b) and the kernel weights have to satisfy P p m = 1 a 2 m = 1, a m ! 0. Within this MKSRC framework, the optimization problem in equation (5) can be mapped into a high-dimensional feature space F. The parts related to the kernel can be described as follows Here, according to Lemma 2 in Chapelle et al., 28 it is possible to differentiate J with respect to a 2 m as if x does not depend on a 2 m ( x is the vector x where the extreme value in J is attained.). We have Using the gradient descent method, we can update kernel weights (a 2 m ) n + 1 = (a 2 m ) n À e n r a 2 J and solve the optimization problem. The following table summarizes the proposed MKSRC method.
Under the condition of convergence, q plays a critical role in determining the complexity of the algorithm and it should be adjusted based on the application requirements. Besides, d j (x) is a vector whose only nonzero entries are the entries in x that are associated with class j. 10 By computing the residual, we can classify the test sample y into the group which minimizes residual error.

Dictionary learning methods
The key component in sparse representation is to construct an over-complete dictionary. It is crucial to choose an appropriate dictionary. One can use predetermined dictionaries, such as undecimated Algorithm 2. Multiple kernel sparse representation-based classification algorithm.
Input. The dictionary D composed by l training samples. A test sample y. The number of kernel function p.

Initialize the kernel weights a m = 1=
ffiffi ffi p p .
2. Compute the coefficient x = arg min 1 2 f(y) À D 0 x k k 2 2 + l x k k 1 n o with the kernel function k(a, b) = P p m = 1 a 2 m k m (a, b). 3. Update the kernel weights by (a 2 m ) n + 1 = (a 2 m ) n À e n r a 2 J. 4. Go back to step 2 until the condition of convergence a n + 1 À a n q is met.
5. Compute the residual r j (y) = 1 2 f(y) À D 0 d j ( x) 2 2 , for j = 1, . . . , l. Output. Identity (y) = arg min j fr j g wavelets, 29 steerable wavelets, 30 and curvelets. 31 Their major advantage is simplicity and low complexity. However, their performance largely depends on the specific characteristics of the target signal. In this article, to address this issue, we introduce a dictionary learning method to update the over-complete dictionary and represent the signals sparsely. Dictionary learning has been widely used recently in many signal processing applications, such as image compression enhancement 32 and classification tasks. 33 To update an over-complete dictionary D in equation (5), a typical dictionary learning algorithm performs a two-stage procedure iteratively: sparse coding and dictionary update. Sparse coding is the process of computing the representation coefficients x based on the given signal and the present dictionary. The dictionary is updated in stage 2 to reduce the representation error.
The MOD introduced by Engan et al. 17,34 is one of the first methods to implement sparsification. 15 Like other learning methods, MOD alternates two steps, with a sparse coding stage that uses OMP followed by an update of the dictionary. The aim of MOD is to find a dictionary D and a sparse matrix x that minimize the representation error arg min where fx i g represents the columns of x and the notation x k k F represents the Frobenius norm, defined as Suppose that the sparse matrix x is fixed, and we can find the appropriate D which minimizes the above error by taking the derivative of equation (17) with respect to D. To achieve (y À Dx)x T = 0, we use the following iterative method In this article, we propose to use K-SVD to update the dictionary which is more efficient than MOD. The K-SVD algorithm is based on the SVD process (K is the number of columns in D) and the dictionary update is performed atom-by-atom in a simple and efficient process. 15 Assuming that D and x are fixed, to update the kth column d k in D, the object function can be rewritten as jjy À Dxjj 2 2 = jjy À where x T j stands for the jth row in x. In the above expression, we aim to the update both d k and x T k by a simple rank-1 approximation 35 of the pre-computed error term

Experimental results
In this section, we evaluate the performance of the proposed method on the dataset collected from a realworld wireless sensor networks (WDSN) in the city of Twenty-Nine Palms, CA, in November 2001. This dataset is available at http://www.ecs.umass.edu/;mduarte/ Software.html. 2 It contains the acoustic, seismic, and IR information of two types of military vehicles, namely, Assault Amphibian Vehicle (AAV) and Dragon Wagon (DW). The original time series data are collected from 18 sensor nodes on 3 routes, as shown in Figure 2. Each node has three types of sensors: microphone, geophone, and polarized IR sensor. These sensors are able to cover a field of about 900 3 300 m 2 , which consists of an east-west road, a south-north road, and an intersection. Each record in the dataset represents a vehicle passing by at a constant speed. Note that the Doppler effect will cause change in the frequencies of the measured signal. Similar to Duarte and Hu, 2 we do not consider this Doppler effect since the relative speed between the moving vehicles and sensor nodes is stable and relatively slow.

Feature extraction
In this experiment, we aim to recognize each vehicle using the acoustic data. The acoustic data were recorded at a rate of 4960 Hz by microphones equipped on sensor nodes. First, we choose the data collected from the 3rd to 11th runs (AAV3-AAV11 and DW3-DW11) as the data source to assess different feature extraction and classification methods. To detect the useful events in raw time series data, we use CFAR detection algorithm 2 which is able to mark times with high energy values. Then, we use the MFCC method to extract features from the event time series for classification purposes.
Choosing an arbitrary set of the above data and comparing its 12-dimensional features extracted by MFCC with 50-dimensional features extracted by fast Fourier transform (FFT) 2 which is calculated for every 512 point sample (every 103.2 ms for the current sample rate 4960 Hz), we can find the distinct difference between them in Figure 3(b) and (c) where the x-axes stand for the frame number and the y-axe is on behalf of the magnitude of each feature item. The MFCC features mainly concentrate in lower feature item compared with FFT ones.

Dictionary learning
For cross-validation, after feature extraction, we divide the acoustic features into two parts, one as test samples and the other as training samples. For vehicle recognition, we need to compute the sparse representation of the test samples using a specific dictionary. The initial dictionary consists of acoustic features of training samples. Then, to better fit the current dataset, we use the K-SVD approach to update the initial dictionary. Using the OMP algorithm, we can first get the corresponding sparse matrix of the test samples. The sparsity level, which stands for the number of non-zero coefficients in the sparse matrix, will affect the recognition performance, as well as the computational complexity of the algorithm. Assumed that there are total 100 training samples in the dictionary, we demonstrate the sparse coding result of a test sample in Figure 4 with different sparsity levels (K = 30 and K = 60). Figure 5 shows the relationship between the sparsity and the time consumption of the sparse coding process with a single sample. The algorithm is written in MATLAB without optimization for speed, and ran on a laptop of Core i5 at 1.60 G with 4-GB memory. We can see that complexity increases significantly with the sparsity level.
To further study the impact of different dictionaries on the algorithm performance, we define the relative error of the sparse representation as Compared with the initial dictionary which has the original training samples, the dictionary updated by K-SVD (with eight times iteration) shows its significantly improved performance as we can see from Figure 6. The relative error decreases significantly with the sparsity level. This implies that, in practice, we need to choose the appropriate sparsity level and find a good tradeoff between the relative error and complexity.

Classification methods
To solve the vehicle recognition problem by the proposed MKSRC model, the features are mapped into a high-dimensional feature space with the kernel function. Here, we choose two common kernels, polynomial kernel (22), and Gaussian RBF kernel (23) where d and b are the parameters related to the characteristics of acoustic features and they are tuned using cross validations. We compare our MKSRC method with existing classification methods SVM 2,36 and SRC 10,37,38 In our experiments, there are 90 samples for each vehicle collected from 9 runs (3-11) of 10 sensor nodes (51-56 and 58-61), as well as 90 samples for the noise in the acquisition process. To validate the results of a classifier, we employ threefold cross validation with stratified partition of the samples. The classifier is trained three times and each time a different set is used as a validation set. Tables 1 and 2 present the detection, false alarm, and classification rates based on FFT and MFCC. Here, the detection rate is defined to be the ratio between the number of correct classification samples and the size of the class. The false alarm rate is defined to be the ratio between the number of incorrect classification samples and the total number of samples   in other classes. Furthermore, to analyze the effect with different dictionaries, we list the classification result based on the dictionary updated by K-SVD in Table 3. The kernel parameters, d, b, are tuned by cross validations. In this work, we have obtained d = 2, b = 0:0002. For feature extraction, the MFCC method has a distinct advantage over FFT since it considers human auditory characteristics. For the choice of dictionary, we find that our method using updated dictionary outperforms other classifiers based on sparse representation. From Table 3, we can see that the proposed MKSRC achieves a high classification rate of 90.00%, outperforming other methods while achieving lower false alarm rates.
To further illustrate the performance of MKSRC, we introduce the normalized correlation between the sparse codes of SRC, KSRC, and MKSRC. We list the results in two classes (each contains 30 samples) in Figure 7, where the x-axes and y-axes both stand for the number of samples used and the first 30  and the rest (31-60) are from the same class, respectively. The entry (p, q) is the normalized correlation of the sparse codes between the acoustic test samples p and q.
According to the definition of correlation, the normalized correlation of the sparse codes should be blockwise since the sparse codes belonging to the same class are more similar. In Figure 7, we find that MKSRC becomes more discriminative sparse codes since the correlation coefficients of the same class (the first 30 samples belong to AAV and the rest belong to DW) in MKSRC are generally higher than in SRC and KSRC, which facilitates the better classification performance.

Conclusion
In this work, we have studied the problem of vehicle recognition using acoustic sensor networks. We have developed a new method, called MKSRC, for vehicle recognition. Acoustic features of vehicles are extracted and mapped into a high-dimensional feature space using the kernel function, which combines multiple kernels to obtain linearly separable samples. To improve the recognition accuracy, we incorporate dictionary learning into the MKSRC framework. By calculating the reconstructing error and updating the kernel weights, the objective vehicles will be recognized by solving the optimization problem. Our extensive experimental results demonstrate that the proposed MKSRC method with learned dictionaries outperforms other existing methods based on SVM, SRC, and KSRC in the literature on vehicle recognition from complex acoustic sensor network datasets. In our future work, we will focus on the self-adaption of the kernel parameters to further improve the recognition efficiency and robustness.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.