Distributed Face Recognition Using Multiple Kernel Discriminant Analysis in Wireless Sensor Networks

This paper proposes a module based distributed wireless face recognition system by integrating multiple kernel discriminant analysis with face recognition in wireless sensor networks. By maximizing the margin maximization criterion (MMC), we separately perform an iterative scheme for kernel parameter optimization for each module. The simulation on the FERET and CMU PIE face databases shows that our multiple kernel framework and the optimization procedure achieve high recognition performance, compared with single-kernel-based KDDA.


Introduction
Face recognition (FR) system is one of the most important biometric techniques and is used in a wide range of security applications such as access control, identification systems, and surveillance [1].FR is a contactless biometric technique and has advantages of being natural and passive over other biometric techniques requiring cooperative subjects, such as fingerprint recognition and iris recognition [2].A normal framework of FR system is shown in Figure 1, including procedures of enrollment and identification [3].
In recent years, FR systems combined with wireless sensor networks (WSNs) [4] have shown great interest, as WSNs are very helpful for contactless biometrics security applications.For example, Kim et al. implement a wireless face recognition system based on ZigBee protocol and principle components analysis (PCA) method with low energy consumption [5].Muraleedharan et al. propose the use of a specific evolutionary algorithm to optimize routing in distributed time varying network for face recognition [6].Chang and Aghajan focus on recovering face orientation for more robust face recognition in wireless image sensor networks [7].Zaeri et al. propose application of face recognition for wireless surveillance systems [8].
As there exist many image variations such as pose, illumination, and facial expression, face recognition is a highly complex and nonlinear problem which could not be sufficiently handled by linear methods, such as principal components analysis (PCA) [9] and linear discriminant analysis (LDA) [10].Therefore, it is reasonable to assume that a better solution to this inherent nonlinear problem could be achieved using nonlinear methods, such as the socalled kernel machine techniques [11].Following the success of applying the kernel trick in support vector machines (SVMs) [12], many kernel-based PCA and LDA methods have been developed and applied in pattern recognition tasks, such as kernel PCA (KPCA) [13], kernel Fisher discriminant (KFD) [14], generalized discriminant analysis (GDA) [15], and kernel direct LDA (KDDA) [16].
It has been shown that the kernel-based LDA method is a feasible approach to solve the nonlinear problems in face recognition.However, the performance of the kernelbased LDA method is sensitive to the selection of a kernel function and its parameters.Kernel parameter selection to date can mainly be achieved by cross-validation [17], which is computationally expensive, and the selected kernel parameters cannot be guaranteed optimal.Furthermore, a single and fixed kernel can only characterize the geometrical structure of some aspects for the input data, and, thus, not always be fit for the applications which involve data from multiple, heterogeneous sources [18,19].
Recent applications and developments based on SVMs [20,21] have shown that using multiple kernels (i.e., a combination of several "base kernels") instead of a single fixed one can enhance classifier performance, which raised the so-called multiple kernel learning (MKL) method.With  kernels, input data can be mapped into  feature spaces, where each feature space can be taken as one view of the original input data [19].Each view is expected to exhibit some geometrical structures of the original data from its own perspective such that all the views can complement for the subsequent learning task.It has been proven that MKL can offer some needed flexibility and well manipulate the case that involves multiple, heterogeneous data sources [18,22,23].However, MKL is proposed for SVMs, and there have been few reports on performance of the kernel-based LDA method with multiple kernels.Liu and Feng propose multiple kernel Fisher discriminant analysis (MKFD) with an iterative scheme for weight optimization [24], in which the constructed kernel is a linear combination of several base kernels with a constraint on their weights.
In this paper, we integrate multiple kernel discriminant analysis with face recognition in wireless sensor networks, and propose a module based distributed wireless face recognition system.We consider separate cluster head for each module, that is, fore head, eye, lips, and nose.Only the local cluster is responsible for internal module processing for both training and recognition.
The rest of this paper is organized as follows.First we describe the module based distributed wireless face recognition system in Section 2. Then in Section 3, the optimization scheme for the multikernels is presented.The simulation results are reported in Section 4, while we draw our conclusion in Section 5.

Module Based Distributed Wireless Face Recognition
We present a face recognition system in wireless sensor networks where training and recognition are performed both in distributed environment.The image is divided into four submodules, that is, forehead, eyes, nose, and lips, as shown in Figure 2.For face recognition tasks, enrolment and identification of each submodule are performed in separate cluster heads, and the computations are carried out in kernel feature space [12].Each cluster head is responsible for processing its submodule and communicating with the sink cluster which performs the score level fusion.
The following describes the score level fusion criterion at the sink node.Given  images belonging to  subjects.Denote the membership degree of the th image belonging to the th subject as    ,  = 1, 2, . . ., ,  = 1, 2, . . ., .    is obtained as follows: where    () denotes the score of the th submodule from the th image with regard to the th subject, and   is the corresponding weight value,  = 1, 2, 3, 4. The th image is assigned to the th subject if and only if    = max 1≤≤    .Now we explain score    (), taking the forehead module as an example.In the forehead module cluster, there are also  samples belonging to  classes.For the th sample x  , we can compute the squared kernel distance between x  and the center of the th class m  as follows: where Φ : x ∈ R  → Φ(x) ∈ F is a nonlinear mapping, which is implicitly defined by a mercer kernel function (x, y) = Φ(x)  Φ(y) [12,25].Then we sort the  distances {   |  = 1, 2, . . ., } in ascending order and denote the squared kernel distances after sorting as = 1, 2, . . ., , then we get {   (1) |  = 1, 2, . . ., }.Thus, given the th sample, the smaller the kernel distance between it and the center of the th class, the greater    (1) is,  = 1, 2, . . ., .
Scores    (2) ∼    (4) can be obtained from the eyes, nose, and lips modules, respectively, using a similar way to score    (1).

Optimization of Multiple Kernels
From Section 2, we can see the computations in the proposed distributed wireless face recognition frame are based on kernel techniques.The framework has four computing modules, for forehead, eyes, nose, and lips, respectively.These four kinds of data are so different that it is hard to imagine that an excellent classification can be reached by a single kernel.We propose the use of multiple kernels for these four modules.Specifically, we use the Gaussian RBF kernel but with different values of parameter  for the different computing modules.Every module should have its own optimal kernel parameter.For each module, we separately perform the following kernel parameter optimization procedure.

Some Notations on
where (x   , x  ℎ ) = Φ(x   )  Φ(x  ℎ ); then the kernel matrix corresponds to the nonlinear mapping.
Under the nonlinear mapping Φ, the th mapped class and the mapped sample set are, respectively, given by Also, the mean of the mapped class Φ(  ) and that of the mapped sample set Φ() are, respectively, given by In kernel feature space F (let  be dimensionality of F), the within-class scatter matrix S Φ  and between-class scatter matrix S Φ  are, respectively, defined as where The kernel Fisher criterion is defined as where W = {w 1 , . . ., w  } is a  ×  ( > ) projection matrix.Kernel discriminant analysis is to find an optimal projection matrix W * : R  → R  in mapped feature space F, such that W * = argmax W  Φ (W).

Diagonalization Strategy.
We use the same diagonalization strategy as KDDA [16] to deal with the small sample size (SSS) problem; that is, first diagonalizing S Φ  to I (identical matrix) and then diagonalizing S Φ  to Λ  , which is briefly expressed as follows.

Eigenanalysis of S Φ
in the Feature Space.Φ   Φ  can be expressed using the kernel matrix K as follows: where D = diag(√ where Φ   S Φ  Φ  can be expressed using K, with similar details to that seen in [16]. Let z  be the eigenvector of U  S Φ  U corresponding to the th smallest eigenvalue    ,  = 1, . . ., .Denote Z = (z 1 , . . ., z  ).Defining Y = UZ, it can be derived that Y  S Φ  Y = Λ  , with Λ  = diag(  1 , . . .,    ).Based on the derivation presented in Sections 3.2.1 and 3.2.2, an optimal projection matrix for kernel discriminant analysis is obtained as Certainly, as the nonlinear mapping Φ is implicitly defined by the kernel function (or matrix), Φ  (defined by ( 9)) remains unknown, and W * can not be evaluated.The real meaning of ( 13) is obtaining matrix , which can be computed from the kernel matrix K.This is the core result of diagonalization process.

Optimization Criterion and Objective.
We adopt the maximum margin criterion (MMC) [26] as the objective function to optimize the kernel parameter  for each specific module (module of forehead, eyes, nose, or lips): where W is a projection matrix, and  is the parameter for the Gaussian RBF kernel as in (4).
Based on the result of (13) in Section 3.2, the optimal projection matrix W , which can be computed from the kernel matrix K. Then the objective function ( 14) can be reformulated as where P = Φ   Φ  and Q = Φ   Φ  can be expressed in terms of the kernel matrix K as follows: with D, A  , and 1  defined the same as in (11); where H  = diag(h  1 , . . ., h   ) is a  ×  block diagonal matrix and h   is a   ×  matrix with all terms equal to 1/  .Therefore, regarding G as constant, the objective function () is an explicit function of the kernel parameter .To maximize the objective function with  0 as the initial value of the parameter, an iterative procedure based on Newton's method is developed in our method to update the kernel parameter, as shown in the following section.

Solving the Optimization Problem.
Assume that G is constant.To obtain the extremum of the objective function (), we need to differentiate () with respect to .
Each element in matrix K  is the differentiation of the corresponding element in the kernel matrix K with respect to  and can be formulated as From ( 16) and ( 17), matrices P and Q can be differentiated with respect to  as follows: where P/ is a  ×  matrix and Q/ is a  ×  matrix.
Then, the derivative of  with respect to  can be formulated as Thus the derivative of  with respect to  can be expressed in terms of matrix K and matrix K  .To achieve the maximum of (), we set the derivative to zero: We use Newton's method to solve (22) with the initial value of the kernel parameter that is,  2 0 is the average squared distance of all the samples in the given module.And the iteration formula is = 0, 1, 2, . ...

Identification Using Multiple Kernels.
For each of the four modules (forehead, eyes, nose, and lips), the optimization of the Gaussian kernel parameter runs and finds the optimal value as above separately.Then the four submodules of a testing image are fed into the corresponding kernel discriminant classifiers to compute the membership degree of the image belonging to the every subject according to (1).Finally the image is assigned to the subject which shows the greatest membership degree of the image.

Simulation
To evaluate the performance of our multiple kernel framework for distributed wireless face recognition, we have made experimental comparisons with KDDA based on Gaussian RBF kernel, in terms of recognition accuracy.Images are from two face databases, namely, the FERET and the CMU PIE databases.
In our experiments, for the weight value in the fusion criterion (1), we set  1 = 0.1,  2 = 0.3,  3 = 0.4, and  4 = 0.2, meaning that we give larger weights to the nose module and the eyes module.

Face Image Datasets.
From the FERET database [27], we select 72 people, with 6 frontal-view images for each individual.Face image variations in these 432 images include illumination, facial expression, wearing glasses, and aging.All the images are aligned by the centers of the eyes and the mouth and then normalized with a resolution of 92 × 112.The pixel value of each image is normalized between 0 and 1.The original images with resolution 92 × 112 are reduced to wavelet feature faces with resolution 49 × 59 after 1-level Daubechies-4 (Db4) wavelet decomposition.Images from one individual are shown in Figure 3.
In the CMU PIE face database [28], there are a total of 68 people, and each person has 13 pose variations ranged from the full right profile image to the full left profile image and 43 different lighting conditions, 21 flashes with ambient light on or off.In our experiments, for each person, we select 56 images including 13 poses with neutral expression and 43 different lighting conditions in the frontal view.For all frontal-view images, we apply alignment based on two-eye center and nose center points, and no alignment is applied on the other images with poses.All the segmented images are rescaled to the resolution of 92 × 112, and then reduced to wavelet feature faces with resolution 49 × 59 after 1-level Daubechies-4 (Db4) wavelet decomposition.Some images of one person are shown in Figure 4.

Recognition Results
. This section reports the recognition results of the proposed multiple kernel framework and KDDA with a single Gaussian RBF kernel on the FERET and the CMU PIE datasets.For KDDA, the parameter of Gaussian RBF kernel is optimized via grid search.For each subject in the FERET dataset, we randomly select  ( = 2 to 5) out    Table 1 shows the average and standard deviation of the accuracies for FERET ( = 4: 4 images per subject for training with the rest for testing) and CMU-PIE ( = 14: 14 images per subject for training with the rest for testing), respectively.
From the results in Figure 5 and Table 1, it can be seen that the proposed multiple kernel framework can achieve higher accuracies than KDDA with an optimized parameter.

Conclusion
In this paper, on the assumption that multiple kernels can characterize geometrical structures of the original data from multiple views which can complement to improve recognition performance, we integrate multiple kernel discriminant analysis with face recognition in wireless sensor networks and propose a module based distributed wireless face recognition system.For each module, we separately perform an iterative scheme based on Newton's method for kernel parameter optimization, by maximizing the margin maximization criterion.The multiple kernel framework and the optimization procedure yield high recognition accuracy on the FERET and CMU PIE face database, compared with a single kernel.

Figure 3 :
Figure 3: Images of one person from the FERET database.

Figure 4 :
Figure 4: Some images of one person from the CMU PIE face database.

Figure 5 :
Figure 5: Comparison of accuracies obtained by multiple kernel framework and KDDA with a single kernel.

Table 1 :
Performance comparison between multiple kernel framework and KDDA with a single kernel.The bold value means the higher average accuracy of the two methods. of 6 images for training, with the rest for testing.In the CMU PIE dataset, the number of randomly selected training images is ranged from 10 to 18 out of 56 for each individual, while the rest are testing images.The average recognition accuracies over 10 runs on the FERET and CMU PIE datasets are shown in Figures5(a