Face Recognition Based on Optimized Projections for Distributed Intelligent Monitoring Systems

Compressive sensing (CS), as a new theory of signal processing, has found many applications. This paper deals with a CS-based face recognition system design. A novel framework, called projection matrix optimization- (PMO-) based compressive classification, is proposed for distributed intelligent monitoring systems. Unlike the sparse preserving projection (SPP) approach, the projection matrix is designed such that the coherence between different classes of faces is reduced and hence a higher recognition rate is expected. The optimal projection matrix problem is formulated as identifying a matrix that minimizes the Frobenius norm of the difference between a given target Gram and that of the equivalent dictionary. A class of analytical solutions is derived. With the PMO-based CS system, two frameworks are proposed for compressive face recognition. Experiments are carried out with five popularly utilized face databases (i.e., ORL, Yale, Yale Extend, CMU PIE, and AR) and simulation results show that the proposed approaches outperform those existing compressive ones in terms of the recognition rate and reconstruction error.


Introduction
Face recognition (FR) has played a very important role in multimedia based applications. In spite of many years' research, it remains an interesting and challenging research area [1]. Figure 1 depicts the conventional face recognition process. As it involves storing and transmitting high dimensional images, image compression techniques such as JPEG and JPEG2000 are used to alleviate the problem [2,3]. At the receiver end, users have to decompress (reconstruct) images and extract the image features for classification. Such a procedure usually requires a lot of computations and hence makes the systems expensive. It can be much simplified if the images can be acquired using compressive sensing, which outputs features of images extracted directly, and the classification and reconstruction are done with the extracted features. See Figure 2.
With the development of Internet of Things and information technology, the demand for the distributed intelligent image monitor systems, as shown in Figure 3, increases greatly, which need an intelligent Internet end with the capacity of sensing and classifying. The cameras are connected by wireless Internet. The main problems with such a system are the transmission and storage of the images and the cost, including the infrastructure and the communications. To implement this architecture, technically speaking, there are several factors to consider: data storage space, RAM space, the computation time, and the transmission bandwidth. The key to solve these issues is to develop a data compression algorithm which can integrate the signal reconstruction and the classification with an accepted performance.
Dimension reduction plays an extensively important role in high dimensional data analysis and studies. In recent years, many dimensionality reduction methods are successfully applied in pattern recognition [4,5]. The principal component analysis (PCA) intends to represent faces by projecting the facial images to the directions of maximal covariance in the facial image data. One of the advantages of PCA is to reduce dimensionality of data but such an approach ignores the relationship between data in high dimensions. Linear discrimination analysis (LDA) explicitly attempts to model the difference between the classes of data. The Fisherface method combines PCA and the Fisher criterion to extract the information that discriminates the differences between the classes of a sample set. The projection matrix is chosen to maximize the ratio between the determinant of the betweenclass scatter matrix of the projected samples and that of the within-class scatter matrix. Nevertheless, Martinez et al. demonstrated that when the training data set is small, the eigenface method outperforms the Fisherface method. Some novel algorithms attempt to reduce the data dimensionality, while keeping the intrinsic characteristics. The local preserve project (LPP) aims to find embedding which can preserve local information and obtain a face subspace that best detects the essential face manifold structure. Because LPP has an excellent ability to find a better projection direction when the distances between classes are large, it can keep the local structure of the data very well. But when the distance between two classes is close or even partially overlapping, it can not process classification effectively due to the characteristics of keeping local information. The recently developed compressed sensing (CS) is a signal processing technique that can acquire a signal efficiently and reconstruct it by finding the solution to an undetermined linear system [6,7]. Its essence is to achieve analog signal discretization with sampling-compression integration, namely, the analog-to-discrete CS. The basic principle of such a CS framework is similar to that of discrete-to-discrete CS [8,9], which can be explained below. In the standard CS framework, it is assumed that the high dimension signals x ∈ R ×1 can be represented as a linear combination of vectors { }: where Ψ ∈ R × is known as the dictionary (matrix), while s is ‖s‖ 0 -sparse vector with ‖s‖ 0 denoting the number of nonzero elements of s and corresponding x is said to be ‖s‖ 0 -sparse in the dictionary Ψ. The basic mathematical problem of CS is to study how to reconstruct the original high dimensional signal x from its low dimensional projection y which is mathematically of the form where Φ ∈ R × is called a projection matrix (it is also called measurement or sensing matrix, which will be used alternatively in this paper) with < . Signal reconstruction means to find x from (2) with y and the pair (Φ, Ψ) given. There are two conditions under which recovery is possible. The first one is sparsity which requires the signals x to be sparse in some Ψ. The signal reconstruction problem is given as where A ≜ ΦΨ is called equivalent dictionary. The solution to (3) is unique for sparse signals if A satisfies the restricted isometric property (RIP). See [6,7].
The second condition is related to the mutual coherence [10,11] of the equivalent dictionary A, which is defined below: where denotes the transpose operator. (A) represents the worst-case coherence between any two atoms of A. As shown in [10], the -sparse signal can be exactly recovered from the measurement as long as As seen, the smaller the value of (A) is, the bigger the value of is allowed. The latter implies a wider range of signals that can be recovered exactly using such a CS system. So, minimizing (A) is of importance. Simulations showed that the signal reconstruction accuracy is more related to the number of columns that are strongly correlated compared with (A). Noting this fact, Elad in [11] proposed to minimize averaged mutual coherence with respect to Φ for a given dictionary Ψ. Since then, many different approaches have appeared. Roughly speaking, these approaches are all based on the same framework: to design the sensing matrix Φ such that the Gram matrix of A, defined as A A, is as close to a target Gram matrix G as possible in the sense of where ‖ ⋅ ‖ denotes the Frobenius norm, S is a class of Gram matrices possessing certain properties, and the dictionary is assumed to be given. See [12][13][14][15]. As the diagonal elements of G are all assumed to be one, the Frobenius norm-based error above actually represents the averaged coherence except for a constant factor 1/ ( − 1) if all the diagonal entries of A A are equal to one. The properties of the CS systems designed using (6) are strongly related to the choice of the target Gram G . A column normalized matrix A ∈ R × is said to be equiangular tight frame (ETF) if |A(:, ) A(:, is the smallest mutual coherence that an × matrix possibly has. Such a problem has been investigated for G being the identity matrix and some analytical solutions are available [13]. In the ETF-based approach, the target Gram is taken as the one of a relaxed ETF matrix and the obtained sensing matrix shows an improved performance [13,15]. Developing algorithms to solve (6) for arbitrary G is still an interesting topic.
Sparse representation (SR), an important prerequisite for CS theory, has been applied in face classification [16]. Such an approach, usually referred to as SRC standing for sparse representation classification, can yield a higher recognition rate with varying illumination and expression. With SRC, the recognition is converted to the problem of classification among multiple linear regression models. The desired representation is sparse since the test sample should only be represented in terms of training samples belonging to the same class. The sparse representation can be computed with 1 minimization [6]. However, this algorithm is very time-consuming for large scale databases with images of high resolution. The recognition rate degrades with the feature dimension reduction in general [16]. Sparse preserving projection (SPP), proposed in [17], builds the graphs based on the sparse reconstruction of data. Such a technique tries to preserve the sparsity without considering the coherence issue, which would affect the face recognition rate.
Sparse representation works well in applications where the original signal x needs to be reconstructed as accurately as possible, such as denoising, image inpainting, and coding. However, sometimes we just need to discriminate the signal from its representation rather than reconstructing it. The difference between reconstruction and discrimination has been widely investigated in the literature. It is known that typical reconstructive methods, such as PCA and independent component analysis (ICA), aim at obtaining a representation that enables sufficient reconstruction, and thus they are able to deal with signal corruption, that is, noise, missing data, and outliers. On the other hand, discriminative methods, such as LDA, generate a signal representation that maximizes the separation of distributions of signals from different classes. While both classes of methods have broad applications in classification, the discriminative methods, as expected, have often outperformed the reconstructive methods for the classification task.
The main objective of this paper is to develop an algorithm for face classification and reconstruction, which is intended to be used in distributed monitoring systems and to be implemented in a low-cost microprocessor (e.g., ARM Cortex-M3 and ARM Cortex-M4) platform. The algorithm to be proposed in this paper is projection matrix optimization-(PMO-) based compressive classification, which tries to design the projection matrix Φ in such a way that the within-class coherence is enhanced while between-class one is reduced. The receiving end uses the sparse representation coefficients in the equivalent dictionary for image reconstruction and classification. Precisely speaking, the main contributions in this paper are given as follows: (i) A new distributed monitoring system oriented face recognition framework is proposed based on compressed sensing. Instead of the high dimension original images, their lower dimension counterparts obtained using projection are transmitted and used for reconstruction and recognition. A new target Gram, denoted by G , is proposed for designing the projection matrix in order to improve the discrimination between classes.
(ii) The optimal projection matrix design problem is formulated in terms of identifying those sensing matrices that minimize the difference between the Gram of the equivalent dictionary and the proposed target Gram G in Frobenius norm sense. A class of analytical solutions is derived for the proposed problem which is a generalization of that in [14].
(iii) With the PMO-based CS system, two frameworks are proposed for face recognition. Experiments are carried out and the results confirm that the proposed 4 International Journal of Distributed Sensor Networks approaches can effectively improve the system performance in terms of face classification and reconstruction.
The paper is outlined as follows. Section 2 is devoted to providing some existing works on compressive classification and recognition, which are closely related to ours. Our main contribution is given in Section 3, in which the PMO for compressive classification problem is formulated and an algorithm is derived to solve this problem. With the obtained PMO-based CS systems, two FR frameworks are proposed for distributed intelligent monitoring systems. Experiments are carried out in Section 4 to examine the performance of the proposed approaches and to compare them with some of the prevailing ones. To end this paper, some concluding remarks are given in Section 5.

Related Works and Problem Formulation
This section intends to review the sparse representation classification and some projection-based SRC methods, which have been considered as successful applications of signal sparse representation and compressive sensing theory to face recognition and are closely related to our present work to be developed in this paper.

Sparse Representation Classification.
In [18], the sparse representation is shown to achieve state-of-the-art performance in image denoising. Such a technique was also used as an inpainting method in [19] for recovering missing pixels in images. It was extended to face recognition in [16].
Suppose we have class samples { }; each class has a set of samples with the same size, from which we randomly select samples for training purpose. Each sample forms a vector of dimension × 1 and scaled to unit in 2 -norm, yielding an atom of the dictionary Ψ : where Ψ ∈ R × is the dictionary for the th class.
In the standard SR framework, it is assumed that the original face image signals x ∈ R ×1 can be represented as a linear combination of vectors { }: where = and s is sparse. The procedure for classifying given 2 -normalized x contains the following steps [16].
Step 1. Normalize x in 2 : Step 2. Find the sparse vector s with Such a problem can be solved efficiently using a linear programming technique as the constraint is also linear.
Step 3 (compute the residual energy). Denote by s ∈ R ×1 the subvector of s ∈ R ×1 , corresponding to the th dictionary Ψ . Calculate Step 4 (classification). x belongs to thêth class, wherêis determined witĥ≜ The input-output relationship of this algorithm is simply denoted bŷ= where the dictionary Ψ is 2 -normalized.
It should be pointed out that the SRC described above works on high dimensional signals x. See (10). In the remainder of this section, we will present some methods that carry out the SRC in lower dimensionality domain by working with the projection y of original x via (2); that is, y = Φx. Such a class of classification techniques is referred to as compressive SRC. Therefore, a method in this class differs from another by the ways in which the projection matrix Φ is designed.

PCA-Based SRC.
The PCA provides a dimensionality reduction technique to represent the high dimensional signal x with a lower dimensional one y obtained using (2). The projection matrix Φ is determined as follows.
By singular value decomposition (SVD), R can be rewritten as where Σ = diag( 1 , . . . , , . . . , ) with the principal components satisfying ≥ +1 , ∀ . Under the PCA, the optimal projection matrix Φ ∈ R × with ≤ is given by the first columns of the orthonormal matrix U: The original signal x is projected into y = Φx.
International Journal of Distributed Sensor Networks 5 The PCA-based SRC approach, denoted by Alg SRC PCA , is a classification based on y rather than x.
Step 3. Determine the class that x belongs to usinĝ Such a procedure was proposed in [16].

SPP-Based SRC.
The SPP proposed in [17] aims at carrying out the classification/recognition in a lower dimensional space. Precisely speaking, it works with signals y which are obtained using (2), where the projection matrix Φ belongs to × with < .
In such a framework, the th sample x in the dictionary is featured by its sparse vector s obtained with where 1 ∈ R ×1 is the column vector whose entries are all equal to 1 and S is the vector space of dimension , in which each of the vectors has its th element equal to zero. Denote where S ≜ [s 1 ⋅ ⋅ ⋅ s ⋅ ⋅ ⋅ s ] as the sparse representation error matrix in the sense specified by (19). The projection matrix is chosen such that each row vector of Φ will minimize the sparse representation errors [17]: where the constraint is for avoiding degenerate solutions. Using Lagrange multiplier approach, one can find the solution to (21) from the following generalized eigenvalue problem: and the row vectors of optimal Φ are given the transposed eigenvectors, that is, {k }, corresponding to the top eigenvalues of the above equation. Equation (22) can be solved using MATLAB command eig. : where Λ = diag( 1 , . . . , , . . . , ) with ≥ +1 assumed. Then, the optimal projection matrix Φ ∈ R × for SPP is given by first row vectors of V : As noted in [17], to avoid singularity of this problem, PCA-based preprocessing is usually used (let XX be the SVD of XX ; then, replace X withX = W 1 X).
As understood, the SPP-based SRC method, denoted by Alg SRC SPP , is exactly the same as Alg SRC PCA except that the projection matrix Φ is given by (24).

Sparse Related Face Recognition.
In [20], a class of structured sparsity-inducing norms was included into SRC framework which mainly concerns the misalignment, shadow, and occlusion scenarios. But the authors did not take dimensionality reduction into account which is crucial for the implementation of distributed systems. SRC assumes that the sparse representation residual follows Gaussian or Laplacian distribution, while robust sparse coding (RSC) considered in [21] models the sparse coding as a sparsity constrained robust regression problem and seeks the maximum likelihood estimation solution of the sparse coding. All these make it much more robust than SRC. However, the computational complexity of RSC is also much higher than SRC's which is not practical for the investigated distributed system.

Problem Formulation.
In a distributed intelligent monitoring system, 1 × 2 images { } acquired by the agents are usually of high dimension and have to be transmitted to the back-end server for recognition. In order to have an efficient transmission, it is desired to compress these high dimension signals {x } with x ∈ R ×1 , where ≜ 1 2 . One way to do so is through projection y = Φx , where the projection/sensing matrix Φ ∈ R × with ≪ compresses x into a much lower dimension signal y . Instead of x , y is transmitted.
Once y is received by the back-end server, the recognition of y (corresponding to image ) is done with the equivalent dictionary A: where the dictionary Ψ, as formed in (7), is assumed to be given. The main problem to be investigated in the remainder of this paper is how to design the sensing matrix Φ such that the image can be recognized with the measurement y and the equivalent dictionary A.

The Proposed CS System for FR Distributed Monitoring Systems
As mentioned before, the key to solve the problems encountered in distributed intelligent monitoring systems is to and Alg SRC PCA , described in Section 2, can be viewed as two special CS systems in which the sensing matrix is designed to preserve principal components of the signal covariance matrix and the sparse representation of the samples in the dictionary Ψ, respectively.
As seen in the previous section, the optimal sensing matrix design problem is formulated by (6). Such a problem has been studied intensively for signal compression application, in which achieving high reconstruction accuracy is the ultimate goal of the design.
In the context of face recognition, what we are interested in the most is the recognition rate. Therefore, it is desired to design the projection matrix Φ to enhance the recognition rate. This is strongly related to how to choose the target Gram G . Another important issue is how to solve (6) with arbitrary G .

How to Choose the Target Gram?
Optimizing projection matrix for CS systems has been an important research topic in CS theory, which was initiated by Elad's work reported in 2007 [11]. Since then, many theoretical achievements have been obtained [12][13][14][15]. Roughly speaking, all these results are based on the following formulation: where ‖ ⋅ ‖ is defined as the Frobenius norm, G is the Gram matrix of the equivalent dictionary A = ΦΨ, and G is a target Gram matrix.
For a given dictionary, the performance of the projection matrix depends on the choice of G . When a CS system is designed for signal compression, the reconstruction accuracy is the ultimate goal to achieve. In such a case, G is chosen such that the resultant projection matrix makes the equivalent dictionary A have small coherence between its columns and the system robust against noises, say sparse representation errors of signals. Recently, Cleju showed in [14] that the images can be well reconstructed from the low dimensional measurements projected with the sensing matrix Φ designed based on G = Ψ Ψ, the Gram of the dictionary. Comment 1. It should be pointed out that, in 1 × 2 -image compression oriented CS systems, the images are divided into 1 × 2 subimages, each of which forms a signal vector, say x ∈ R ×1 with ≜ 1 2 (this is very different from what has been used in the SRC-based face recognition approaches, in which the signals are of dimension 1 2 × 1), while the constructive dictionary Ψ is of dimension × with ≫ . The -SVD [18] has been considered as one of the most successful methods for designing such a dictionary. It was observed that the dictionaries obtained with different classes of images/samples are very similar. And our experiments showed that a direct application of such a CS system to face recognition based on reconstruction error does not yield a satisfactory performance in terms of recognition rate. Note that, for face recognition application, our ultimate objective is accurate recognition and to enhance recognition rate it would be better to design the sensing matrix Φ by improving the discrimination between the classes at the price of sacrificing reconstruction accuracy. This is actually one of the main motivations for our proposed approach in this paper, which will be developed further.
with Ψ ∈ R × be the overall dictionary defined before for face recognition and its Gram is with G ≜ Ψ Ψ .
When choosing ΦΨ such that ‖G Ψ − G‖ 2 is minimized, one emphasizes the reconstruction accuracy as the fundamental assumption of the SRC-based face recognition approach is that a sample (image), belonging to a class, is close to a linear combination of a few atoms of the dictionary of the same class. This, however, can not ensure that such a sample can not be well represented sparsely in a linear combination of other (sub)dictionaries. In other words, the reconstruction error itself may not be a proper measure for recognition. As mentioned before, recognition rate is our primary goal and hence in the CS system design, it is desired to choose the projection matrix Φ such that the coherence between the columns of A and those of A for all ̸ = is reduced. By doing so, the discrimination between classes is improved. This can be achieved by choosing the sensing matrix such that the Gram of the corresponding equivalent dictionary is close to the following target Gram as much as possible: where with Δ defined below: International Journal of Distributed Sensor Networks 7 where both > 0 and > 0 are used to adjust the coherence between different classes and that between the columns of the same class, called correction parameter. These parameters should be chosen such that the off-diagonal elements of G are all within [0, 1]. Such a target Gram G is called discriminative Gram.

Comment 2.
(i) In [12], sensing matrix optimization for block-sparse decoding was investigated. It is observed that our problem is the simplest block-sparse case, where the blocks are prefixed according to the classes of face images. G is taken as the identity matrix in [12] and in our problem is assumed to be symmetric. Both encounter how to choose the weighting factors, which is application dependent and purely empirical. We will present some experimental results in the next section.
(ii) It should be pointed out that though, as to be seen, the recognition/classification is done based on the errors ‖y − A s‖ 2 , which has a physical meaning close to the reconstruction error, these errors have taken the discrimination between classes into account via all {A } or the sensing matrix which is designed based on the discriminative Gram G defined in (28). So, such a measure can be considered as a mixture of reconstruction error and discriminative error. It is worth noting that, in [22], a dictionary learning-based classification scheme was investigated where the dictionary was trained by certain class of signals as the classifier, while, in this paper, we consider the classification task in compressive domain. The sensing matrix is designed according to the given dictionary and hence the equivalent dictionary (corresponding to the compressive domain) possesses good property so that the sparse coding can be implemented accurately.
In the next subsection, we will discuss how to solve the proposed optimal sensing matrix design problem.

An Algorithm for Optimizing Projection Matrix.
With the target Gram G defined in (28), one can now consider the optimal projection matrix design, which is formulated as follows:Φ Now, let us consider how to solve the above problem. First of all, assume that Ψ has the following SVD: where Σ Ψ ∈ R̃×̃> 0. Then, we get Then, whereG Finally, the cost function can be rewritten as We can see that the two right-hand sides have nothing to do with the projection matrix. Definẽ and the lower bound (equality) is achieved if and only if Noting that rank(W 11 ) ≤ , one can see that the lower bound can be minimized with With (41) and (42), one has optimalW 11 and hence W 11 : 8 International Journal of Distributed Sensor Networks Let the following be an SVD of W 11 : And set W 12 = 0, W 21 = 0, and W 22 = 0; then where V 22 is any orthonormal matrix with dimensions ( − ).
Since Φ Φ = U Ψ WU Ψ , a class of solutions to (31) is obtained asΦ where Σ 11 ∈ R × and U is any orthogonal matrix with dimension of . Seen from the above discussion, the optimization of projection matrix Φ is only related to the dictionary Ψ and the correction constants and , and, therefore, for given Ψ, , and , we can get Φ, which is better than traditional feature selecting process. And (46) is an analytical solution. In addition, there are two degrees of freedom U and V 22 in the results, which provides the possibility to further improve system performance.

Comment 3.
Our proposed optimal sensing design problem shares the same form as that in [14]. It should be pointed out that, in [14], G = G Ψ and the solutions are relatively easy to obtain, while our G is a generalization of G Ψ and the solutions are actually applicable to any symmetric G . In terms of applications, ours is for face recognition, in which the degrees of freedom, brought by the parameters and , make our approach outperform that in [14].

The Proposed CS-Based Frameworks for FR.
With the obtained optimal sensing matrixΦ, any given high dimension face image x can be projected to a vector of much lower dimension, which will be transmitted to the server of the distributed monitoring system for classification/recognition. Two PMO-based classification methods are proposed here. The first one, denoted by Alg SRC PMO , has exactly the same structure as Alg SRC PCA and Alg SRC SPP , except that the projection matrix Φ is given by (46).
The second one, denoted by Alg 2 PMO , is outlined as follows. 2 Step 1. At a subsystem that captures a face image x, compress it using and then encode and transmit y to the server of the distributed monitoring system.
Step 2. At the server that receives y, compute 2 ≜ miñ s y − As where A ∈ R × is the equivalent dictionary for the th class: Step 3 (classification). x belongs to thêth class, wherêis determined witĥ≜ arg min ∈{1,..., } In the next section, we will examine the performance of the proposed algorithms and compare it with some existing ones.

Experiment Results
In this section, we examine the performance of various classification systems on five face databases: ORL [24], Yale [25], Yale Extend [26] (referred to as Yale-E), CMU PIE [27] (referred to as PIE), and AR [28]: (i) ORL database contains 400 images of 40 individuals (each provides 10 images); that is, = 40. Some images are captured at different time, varying the lighting, facial expressions, and facial details. In our test, we randomly select = 5 images from each individual to form the dictionary set, while the remaining ones are for testing. (iii) Yale-E database includes the Yale face database B and the extended Yale face database. A subset called Yale Extend face database is collected from these two databases, which contains 2414 face images of 38 subjects; that is, = 38. In our test, we randomly select = 20 images from each individual to form the dictionary set, while keeping the remaining for testing.
(iv) PIE database is composed of 68 subjects (i.e., persons (each provides 26 images). Random selection of = 20 images per individual is taken to form the dictionary set, and the rest of the database is for testing.
For each of the five databases, we take all classes (different persons) with each class containing different samples (images), leading to = . All images are resized to resolution of 32 × 32 and normalized in scale; hence, each face sample is represented as a column vector x ∈ R ×1 with = 1024. By doing so, the corresponding dictionary Ψ ∈ R × is then generated.
The systems consisted of different compression methods (PCA, SPP, RDM for random sampling matrix, and PMO) and various classifiers (SRC, RSC, Wlearning for [22], and 2 ) are compared. For instance, Alg SRC PCA denotes the system combining PCA and SRC with the subscript referring to compression method and the superscript the classifier.

The Effect of .
Firstly, we briefly discuss the effect of the number of training samples with ORL database. Figure 4 shows the results of Alg 2 PMO versus with different values and = 2, = 80.
As can be seen, an increasing trend is obtained when gets bigger in general with slight fluctuation. It is reasonable that more samples are used for representation, and higher accuracy is achieved. In the following, proper values will be chosen for different databases as introduced in the beginning of this section.

The Choice of the Correction Parameter .
One highlight of the proposed PMO method is the correction parameter . Now, we set up experiment for testing the effect of . Fixing = 80 and = 2, the recognition rate versus the correction parameter is depicted in Figure 5.  It is clearly seen from the figure that the correction parameter does affect the performance of classification systems though the value should be chosen as database adapted.

Comparison of Different Classifiers.
In this part, we turn to examine the performance of classification methods (i.e., SRC, RSC, 2 , and Wlearning) with PMO applied for compression. Figure 6 shows the recognition results with ORL database used.
The computation times of different methods are given in Table 1.
For this case, the proposed 2 classifier achieves the best results almost in every . The recognition rates of RSC are comparable somewhere, but its computation complexity is  Figure 7: Recognition rates versus . much higher than the others as can be concluded from Table 1, and this is also why RSC is not suitable for large distributed system.

Comparison of Different Compression
Methods. The scenarios where various compression methods are combined with the SRC are investigated and the statistical results are summarized in Table 2.
The computation time with ORL database considered is given in Table 3.
The data in Table 2 demonstrates the superiority of the proposed PMO methods. The computation times of various systems differ with each other in the projection matrix design methods, that is, dimensionality reduction procedure. PMO costs more time when compared with PCA. But the design process is an offline job, and it does not put any additional pressure on the front agents.

Recognition Rate versus Compression Dimension.
In this part, we examine the performance of different systems with various compression rates on ORL database for = 0, = 2, and = 5. Four compression methods are conducted with SRC. In addition, PMO combined with the proposed 2 classifier is also carried out for comparison. Figure 7 displays the recognition accuracies versus measurement dimension .
As can be seen, higher recognition rates are obtained when bigger values are set. The results in Figure 7 are coincident with the statistics of Table 2 that PMO achieves the best recognition rate, which is better than the results of PCA and RDM, and SPP once again performs the worst.

Effect of Occlusion.
One of the most challenging problems in FR techniques is the robustness to face occlusion. In this subsection, we test the performance of our proposed method with different occlusion scenarios. The AR database consists of occlusion face images. Taking these samples for test, Figure 8 shows the recognition rates versus measurement dimension.
This figure indicates that even for occlusion scenarios, as the measurement dimension increases, better recognition rate results can be achieved and SPP is the worst of all. Figure 9 depicts the recognition rates versus the occlusion percentage with ORL database used, and = 80, = 0.02, and = 2 for the proposed PMO.
The results of Figure 9 demonstrate that SPP and RDM are more sensitive to the occlusion, and PMO and PCA always perform better than the other two.

Image Reconstruction Experiment.
The PMO algorithm is applied to distributed intelligent monitoring system, and image reconstruction can be performed on the server terminal. Denote the test sample x ∈ R ×1 and the reconstructed signalx ∈ R ×1 . The mean square error (MSE) is defined as A popular used indicator to evaluate the image reconstruction accuracy is peak signal-to-noise ratio (PSNR) defined as PSNR ≜ 10 × log 10 [ with = 8 bits/pixel. Figure 10 shows the evaluation of PSNR for varying measurement dimension with = 0.02 and = 2 for the proposed PMO.
International Journal of Distributed Sensor Networks 11   Again, the results of PMO and PCA are similar and better than the others. Please note that as Alg SRC SPP performs much worse than the others, we omit it in this figure. One example is given in Figure 11 to demonstrate the visual effect of the proposed system.

Conclusion
This work presents a novel CS-based face recognition scheme for distributed intelligent monitoring systems. A new compression strategy has been proposed based on the projection matrix optimization. The analytical solution set of the corresponding optimization problem has also been derived. With this new compression method, two frameworks have been presented for compressive face recognition. Experimental results on face recognition tasks demonstrate the superiority of the proposed approaches.
The monitor system always encounters the problem that we need to register new face and update the whole system. The FR oriented online CS system learning is an interesting direction of research for distributed intelligent monitor system design. In our approach, the dictionary is formed directly with the samples. It is expected that more efficient FR oriented CS systems can be achieved if both the dictionary and the sensing matrix are optimized alternatively or simultaneously.