A hybrid data-driven fault detection strategy with application to navigation sensors

The integrated navigation system highly relies on the accuracy of measurements of sensors that are susceptible to unknown disturbances. In order to improve the reliability and safety of the navigation system, there is an increasing need for the fault detection of the sensors. In the present study, a hybrid data-driven fault detection strategy is proposed, which is based on residual sequence analysis. Currently, the one-class support vector machine is one of the most popular fault detection methods for navigation systems with many successful cases. Therefore, the one-class support vector machine is combined with time-series similarity measure and modified principal components analysis approaches. The similarity measurement of multi-sequence residuals between a real-time sample and normal condition samples is computed to construct learning features for one-class support vector machine. Similarly, the modified principal components analysis scheme is applied to project residuals onto subspaces and obtain learning features. Moreover, the one-class support vector machine model is applied for abnormal detection if unexpected sensor faults exhibit in measurements and residuals. Finally, experiments are carried out to evaluate the performance of the proposed strategy for abrupt faults and soft faults on navigation sensors. Experimental results show that the hybrid data-driven fault detection strategy can effectively detect these faults with short time delay and high accuracy.


Introduction
With the considerably increasing demands for reliability and stability in the complex multi-sensor systems, fault detection (FD) has become an essential research field to ensure the precise and accurate performance of sensors. In addition to the FD research hotspot in industrial processes, the data-driven FD methodology which originates from insufficient knowledge about complex systems and unknown fault types 1-3 has attracted many scholars. Recently, the model-based FD method has been applied in diverse applications, and remarkable results have been applied for process monitoring and system fault diagnosis. 4,5 However, the model-based FD method needs physical and mathematical prior knowledge of the system. On the contrary, when the structure of the engineering system is complicated and the operational requirements vary in different conditions, the model-based approaches confront significant limitations for obtaining satisfactory FD results. 6 In order to resolve this problem, a data-driven strategy has been proposed to collect the data of various operating states.
The normal input/output variables and their correlations in this condition can be obtained based on data training of a ''particular'' operation state. Then, the training model can be used for process monitoring and abnormal detection. Reviewing the literature indicates that the data-driven FD schemes have attracted many scholars in recent years for investigating dynamic systems. The integrated navigation system that contributes a significant role in various carriers, including aircraft, vehicle, ship, and modern weapon system, should provide accurate geographical position, velocity, and attitude information. 7 In order to maintain the navigation sensors (i.e. gyroscopes, accelerometers, and global positioning system (GPS) receiver) at stable performance and monitor the corresponding estimation states, many FD strategies have been proposed so far by the researchers. [8][9][10][11][12][13][14][15][16][17][18] Conventional model-based approaches and data-driven approaches were proposed accordingly for intelligent FD and fault diagnosis. 8 Moreover, considering extensive applications of unmanned operation systems and complicated integrated navigation systems, finding a reliable real-time FD method is of great importance. The integrated navigation system faults can be mainly divided into two categories: (1) abrupt faults caused by hardware failures or strong impulse disturbance, and (2) soft faults that widely exist in inertial sensors, which may come from severe drift errors. Abrupt faults cause a serious deviation of the navigation system in a short time. However, these faults can be detected by simple analysis. On the other hand, soft faults that affect other subsystems or even the whole system by slow changes can be hardly detected and isolated. This is especially more pronounced when soft faults originate from minor errors where there is insufficient knowledge about these faults. In order to resolve this shortcoming, investigating the FD of integrated navigation systems has received significant attention in the past decades. [9][10][11] The existing FD methods for the navigation systems can be generally divided into two groups, including analytical model-based methods and data-driven methods. 9 Analytical model-based methods depend on the constructed physical model. However, analytical methods require lots of prior knowledge such as accurate parameters of the dynamic model. 12,13 For instance, the state chi-square test (SCST) is the most classic method for an integrated navigation system that detects the fault through a constructed statistic between measurements and predictions of the recursion filter. Monteriu et al. 13 presented a model-based multiple sensors fault detection and isolation (FDI) by using the ''structural analysis'' that includes the residual generation and ad hoc residual evaluation for unmanned ground vehicles (UGVs). Moreover, H ' estimation is another effective way for the navigation system FD by the residual generator design. 15 However, residual evaluation is still an enormous challenge for nonlinear systems.
By contrast, the data-driven method solves FD problems by multivariate statistical methods and training machine learning models from the historical input/output dataset. From this point of view, the residual chisquare test, which detects the various faults by a mathematical statistics method, can be classified into a datadriven method. Although the residual chi-square test has more reasonable dynamic and real-time performance, it lacks sensitivity to soft faults and heavily relies on system parameters. On the contrary, the intelligent methods depending on the nonanalytic model, such as the artificial neural network (ANN), support vector machine (SVM), Markov models (MMs), or other models, provide powerful approaches to implement data-driven FD. Reviewing the literature shows that studies on data-driven FD methods for integrated navigation systems have received much attention recently. Guo et al. 16 proposed an active FD method based on one-class support vector machine (OC-SVM) and deep neural network (DNN). They effectively applied the OC-SVM to detect faults of navigation sensors, and the DNN predicts the running data to replace fault time data. Moreover, Zhao et al. 17 established an FD model by using the belief rule base (BRB). Then, an expectation-maximization (EM) algorithm is adopted to investigate parameter recursive estimation and online update. The model can track the fault state and investigate the FD in real time. Xu and Lian 18 proposed a multi-channel single-dimensional fully convolutional neural network (MS-FCN) FD method. This method extracts the features from measuring residual sequences of the sensors and discriminates the operating state with the prior information. These methods utilize sensor sampling data or residual data as a mathematical model training the dataset directly. However, it is a challenging task for an integrated navigation system due to its complicated multi-sensors, limited training dataset, and lack of prior knowledge of fault states. It is worth noting that as the accuracy of the recursive filter algorithm is improved, it is expected to make full use of residual data to implement FD in a data-driven way. 19,20 Therefore, the hybrid method based on model and data-driven has been proposed in the literature. For example, Liu et al. 21 combined the SCST and simplified the fuzzy Adaptive Resonance Theory Map (ARTMAP) neural (SFAM) network to overcome the problem of the FD in a noisy environment.
According to these studies and their successful applications in the navigation system, it is intended to propose a new hybrid data-driven FD strategy. The proposed strategy combines the OC-SVM model with time-series similarity measurement (SIM) and modified principal components analysis (MPCA) approaches. The residual sequence from the Kalman filter (KF) is preprocessed using SIM and MPCA approach to investigate accumulated fault errors. Then, the dataset of the normal condition is collected and trained in the SVM model for detecting sensor faults. The proposed data-driven FD method for the integrated navigation system is formulated as an abnormal detection problem when prior knowledge of fault types is difficult to obtain. It is expected that the proposed strategy can provide a real-time FD method based on status monitoring for both abrupt and soft faults.
The present study is organized as follows: The ''Framework of hybrid data-driven FD'' section presents the hybrid data-driven FD framework for the integrated system. Then, SIM and MPCA methods are introduced for analyzing the multi-sequence residuals. In the ''Residual characterization'' section, the FD is formulated as an abnormal detection problem using OC-SVM. Experimental validation of the proposed strategy is presented in the ''FD based on OC-SVM'' section, and then follows the ''Simulation experiments and results'' section. Finally, concluding remarks are given iresulting inn the ''Conclusion'' section.
Framework of hybrid data-driven FD Figure 1 illustrates the main framework of the hybrid data-driven FD strategy for the integrated system. It is observed that the FD sub-filter consists of the measurement, preprocessing, and FD units. The measurement unit provides the sampling data of sensors (inertial measurement unit and GPS) in real time and residual data of KF. When the system is operating at the normal condition, the residual sequences of KF are stored as multivariate time-series dataset beforehand and then these datasets are processed as the training dataset L. Moreover, the preprocessing unit consists of two main approaches, including SIM and MPCA approaches. Since the soft fault changes slowly, SIM and MPCA approaches are utilized to characterize residual errors over long intervals. To this end, fault-free residuals have initially been recorded for long enough periods and then these residuals are processed by the SIM and MPCA modules to determine the characteristics of the faultless dataset. Second, the corresponding real-time residual errors are calculated through the SIM and MPCA methods and then these errors are used as a testing vector for the FD unit. Then, possible faults and failures are identified in the FD unit by the OC-SVM model, which is responsible for abnormal detections and gives feedback if faults are determined. The framework shows that this data-driven FD strategy heavily relies on the performance of the KF and can effectively give the state estimations and measurement predictions. Unlike other FD methods, the residual sequences are normally applied for constructing learning features instead of setting a threshold directly.
It should be indicated that the modular of the KF, SIM and MPCA processor, and OC-SVM model can be considered as sub-filters in more integrated navigation systems. 16 Meanwhile, the sub-filter design can be applied in a federal filter structure. The FD module serves as a filter and controls a switch to determine whether the connected sensor is in a good condition or not. The main filter of the navigation system can adjust the filtering mode to generate reliable navigation data. Therefore, the FD module plays a significant role in the integrated navigation system. Furthermore, the hybrid data-driven FD method provides a basis for the fault diagnosis and fault-tolerant techniques to meet the reliability requirements of navigation sensors.

Residual characterization based on SIM
In the residual chi-square test method, the soft faults coincide with minor errors. Moreover, the forecasting valueX k=kÀ1 traces the output of fault, resulting in the small residual remaining, which can be hardly detected. In fact, finding patterns from residuals for navigation system FD and a promising alternative method is a challenging task. Therefore, the method of detecting residuals at a single time is changed to the method of analyzing the variation characteristics of residuals in a certain period. The main purpose of residual characterization is to discover meaning features at different fault conditions. Based on analyzing multivariate time series in real time, it is found that FD methods mainly require three research contents as the following: (1) appropriate representation, (2) SIM, and (3) suitable pattern recognition. 22 It is worth noting that the clustering algorithm 23 and the time-series fitted method 24 are successful data-mining approaches in diverse FD tasks. Accordingly, it is intended to introduce the SIM method based on multivariate time series into the FD strategy of the integrated navigation system. It is worth noting that the SIM method serves as a preprocessor to characterize the informative fault patterns.
Assume that the dynamic model of a discrete integrated navigation system with a fault can be formulated in the form below where Z(k) 2 R m , X(k) 2 R n , and f(k, k À 1) 2 R m3n denote the measurements of the system, system state, and the transition matrix states, respectively. Moreover, G(k À 1) 2 R n3r is the noise matrix. W(k À 1) 2 R r and V(k) 2 R m are independent Gauss white noise sequences. g is random fault sequence and f(k, u) is piecewise function, which can be mathematically expressed as follows where u donates the time once a fault occurs. The recursive state vectorX(k, k À 1) and prediction of measurementsẐ(k, k À 1) of the system at time k can be computed recursively as followŝ In the residual chi-square test method, a statistic is constructed using predicted measurementsẐ(k, k À 1) and real measurements Z(k) of one epoch. Assume that the residual at normal condition is represented as r k and obviously, it is Gauss white noise with zero means. When a fault occurs at time u, the system state can be expressed in the form below where g k represents errors added to the residual vector at time k. Based on discussions in the foregoing section, when a soft fault is presented by g k , then the chi-square test method cannot effectively detect the fault. Because once a fault occurs, the state estimation follows the soft fault recursively. In order to improve the sensitivity of soft FD, the SIM module is applied in the present study to characterize accumulative fault errors over a period of time. Assuming that the SIM module can store the residual sequences with length a, then the multivariate time-series dataset D at time k can be obtained as follows . . . a \ u, k5u Equation (5) indicates that the dataset D 2 R a3m is time-series collection with multi-attributes. Once a fault occurs in navigation system sensors, the fault error is accumulated and presented in this multivariate time series of residuals. Subsequently, the SIM module is applied for constructing the learning features of the OC-SVM model, and the dynamic time warping (DTW) method is applied in the proposed strategy. A residual sequence dataset L of the system in good condition, called the normal multi-sequence, can be measured and collected in the form below When the system performs the actual navigation task, a real-time residual at each discrete epoch k is generated based on the local filter and the obtained result is stored as multi-sequence D a over a period of time. These processes can be mathematically expressed as Then, a elements of the residual should be selected from normal multi-sequence dataset L. In the present study, the random selection method is utilized to ensure that the validity of the proposed method is not affected by a certain time. The selected normal multi-sequence L a can be expressed as Subsequently, the DTW distance between the same variables of D a and L a can be computed. The SIM value of D a and L a is described as Based on the measured distance through the DTW method, accumulative error during fault occurring time can be integrated into similarity measurement Sim k , which is also the learning feature vector of the SVM module.

A modified PCA for residual characterization
Studies show that the PCA method is a basic and efficient statistical method that can effectively extract and preserve a significant amount of information for the data variability and proposes originally of the dimension reduction. On the contrary, the PCA method has a simple structure, which is more appropriate for handling a large number of stationary process data with a Gaussian distributed variable. Furthermore, the PCA scheme has been widely and successfully employed as a multivariate statistical tool in many status monitoring and fault diagnosis applications. [25][26][27] Based on a hybrid linear-nonlinear statistical modeling, Deng et al. 28 proposed a serial PCA (SPCA) for nonlinear process monitoring. Furthermore, Peng et al. 29 reported a kernel independent and principal components analysis (kernel ICA-PCA) for the hot strip mill process. As an effective data-driven FD and diagnosis tool based on multivariate statistical process monitoring, PCA and its extension have been investigated by many researchers. 1 In this section, a modified PCA is proposed to obtain residual characterization vector to efficiently characterize residuals as learning features for the SVM method.
Similar to the SIM method, a recorded residual dataset D at the normal condition is collected with zero mean and normalized with the unit variance for training purposes. In the proposed hybrid FD framework, the multivariate dataset L can be shared by SIM and PCA methods as follows  The covariance matrix is defined as Then, singular value decomposition (SVD) is performed on the covariance matrix f where L = diag(l 1 , . . . , l m )l 1 5 Á Á Á 5l m 50 denotes singular matrix. Based on the magnitude of singularities, P and L can be mathematically expressed as where P pc 2 R m3b and P res 2 R m3 mÀb ð Þ contain the singular vectors correlated to the first b large singular values and last (m À b) small singular values in L pc and L res . Therefore, the subspace of singular vectors P T pc and P T res are called principal subspace and ''residual'' subspace, respectively. For the basic FD method by the PCA model, the measured variable z is projected onto the two orthogonal subspaces P T pc and P T res and evaluation thresholds are defined by the aforementioned projections. However, in order to avoid missing a fault in one subspace, a combined method is adopted in the proposed strategy that simultaneously uses both test statistics. In the proposed modified PCA method, the multivariate residual sequence in real time is applied as the input matrix to form a learning vector for the SVM model. Assume D a is a a3m-dimensional measured multivariate residual dataset. Then, the corresponding residual characterization can be expressed as The matrix T 2 ReC 2 R a3a is characterized and obtained by the principal subspace P T pc and ''residual'' subspace P T res . Then, the diagonal elements of the matrix T 2 ReC is chosen as the learning features for the SVM model Similarly, the modified PCA method can be effectively applied to characterize accumulative errors in residual sequences in the vector PCA k during a epochs. It should be indicated that these residual sequences can provide additional learning features for the SVM method.

Theoretical analysis for residual characterization
As mentioned in the foregoing sections, the residual characterization based on the SIM and MPCA methods can be applied to construct the learning features for the SVM method, which drives the OC-SVM model to implement abnormal detection. The common advantage of these two methods is the ability to characterize error g k at each measurement epoch into learning vectors. In this case, the obtained vector at the fault condition is different from that of the normal condition. Therefore, a theoretical analysis for the advantages of SIM and MPCA characterization methods is described as the following.
Assume that real-time multi-residual sequences at fault-free condition D a FF and fault condition D a F0 are mathematically expressed in the form below .
Comparing the sequences V D1 and V 0 D1 indicates that each element in V 0 D1 adds a different error g i (14i4k), which results in a remarkable difference between similarity measurements of Sim k FF and Sim k F0 . It is worth noting that such difference increases as the measuring time extends. Consequently, when errors originating from soft faults accumulate over a period of time to a certain extent, an abnormal SIM occurs, which can be detected by the OC-SVM method.
Similarly, the multi-residual sequences D a FF and D a F0 can be characterized by the MPCA method. Assume that PCA k FF and PCA k F0 are two vectors after characterization of the MPCA. Then, the following equations can be obtained accordingly Since fault errors affect the results of features learned by the PCA model, the elements of characterized vectors PCA k FF and PCA k F0 are significantly different. Moreover, equations (28) and (32) indicate that errors ½g 1 , . . . , g a can be characterized into elements of vector Sim k F0 , while m3g i (14i4k) errors are characterized into the elements of PCA k F0 . These two different methods of characterizing errors can also be investigated from vertical and horizontal directions. Therefore, the detection strategy is a hybrid strategy (HS) of the characterization methods.

FD based on OC-SVM
FD based on the OC-SVM method is an anomaly detection approach. It is one of the most popular datadriven FD methods with wide applications in diverse areas. [30][31][32][33] Studies show that this technique is especially effective for the situation where normal operation samples are easily accessible, while the fault samples are expensive to be understood. Therefore, since the prior knowledge of unknown faults is rare, the OC-SVM method is a powerful scheme to FD of the multi-sensor navigation system.

OC-SVM
The OC-SVM method is a kernel based on a support vector description with a training dataset (target class) consisting of positive examples only. It computes the smallest sphere in the feature and finds a unique optimal hyperplane that separates the training dataset from the origin with maximum margin. In other words, the origin is treated as an outlier from the target class. In the proposed hybrid data-driven FD strategy, the characterization vectors of normal condition dataset Sim K and PCA K based on the SIM and MPCA methods, respectively, are treated as training samples with positive labels, where K denotes the number of training samples. The training dataset can be mathematically expressed as Then, the optimal hyperplane is described as the following Where c donates a mapping function. Therefore, this optimization problem can be solved as follows min w, r, j where w and r are normal vector and offset, respectively. Moreover, v 2 0, 1 ð and j i 50 donate the regularization parameter and slack variable, respectively. Introducing the Lagrange multipliers a i to equation (34) and deriving w, the dual optimization problem is described as where F is a positive definite function (kernel function), such as polynomial kernel, radial basis function (RBF) kernel, or Gaussian kernel. Moreover, its expression is described as follows The kernel function induces the OC-SVM working in the feature space and we focus on RBF kernel in our strategy. After obtaining the optimal solution a, the constant r can be given as where s i is the sample with the corresponding a i 2 (0, 1=vK). Moreover, the hyperplane function of OC-SVM is determined, and a new sample s x can be estimated by sgn(f(s x )). Tax and Duin 34 proposed the support vector domain description (SVDD), which is an equivalent formulation of the OC-SVM method. The main purpose of the SVDD is to find a hypersphere to surround the training samples with the lowest volume.

Abnormal detection algorithm
In practical applications, a statistic detection amount should be determined for the FD problem. More specifically, T 2 and Q statistics are used in the classical FD method to monitor the process data. Similarly, a corresponding threshold should be developed for the OC-SVM abnormal detection. In the present study, Àf(s) is selected as the distance metric, which can be formulated as where f(s) donates the normal distance of the sample data from the decision boundary in the feature space. 35 For the obtained real-time multi-sequence residual data, when the corresponding Sim K or PCA K feature vector locates inside the boundary of the training space, the term F (s) takes negative values so that it is classified as a normal point. In contrast, the residual data are considered as outliers when the corresponding term F (s) becomes greater than zero. It is observed that zero should be determined as the threshold of the distance metric for the abnormal detection. However, it requires to tune the parameter u and the RBF kernel s precisely. The cross-validation method proposed by Mahadevan and Shah 36 is adopted in the present work to obtain a suitable distance metric. Based on the discussed data, Figure 2 shows the abnormal detection scheme for the integrated navigation system FD. Based on discussions in the forgoing sections, the pre-collected multisequence residual dataset is applied for constructing learning vectors using the SIM and MPCA methods. Then, these vectors are applied for training the OC-SVM model. The real-time multi-sequence residual samples are processed by the SIM and MPCA module, and the new vector is used for the abnormal detection to distinguish the status of the system.

Experiments setting and FD results
In this section, an inertial navigation systems/global navigation satellite system(INS/GNSS)-integrated navigation system of the unmanned aerial vehicle (UAV) is designed in the MATLAB environment to evaluate the validity of the proposed hybrid data-driven FD strategy. The abrupt faults and soft faults are both simulated to occur on the integrated navigation system.
The training dataset is initially generated by simulating the normal operation of the system. Then, several faults are set into navigation sensors successively at different times. The multi-sequence residuals of fault condition are selected as the testing dataset. Table 1 shows the specifications of the UAV integrated navigation system. Moreover, Table 2 presents details of specific faults.
In order to obtain more ideal Gaussian distribution residual data, the data in the stable state of trajectory at normal conditions are collected as prior data. It should be indicated that each simulation is conducted twice with the same trajectory to obtain the training dataset in reasonable condition. The duration of each simulation is 20 min and 231000 groups of residual samples are collected accordingly. In other words, the number of samples is N = 1000 in the multivariate datasets L 1 and L 2 . The two datasets are used for generating the learning features of the OC-SVM scheme. Therefore, it is concluded that the residual data have ideal normal distribution.
Based on the continuous residual samples, the feature vectors for training the OC-SVM model should be computed through the proposed SIM and modified PCA methods. In order to compare the obtained results from the two methods, the same parameter setting should be set applied in both SIM and MPCA methods. More specifically, length a of multi-sequence residuals D a is set accordingly. However, as aforementioned, the way of selecting multi-sequence residuals dataset is different. In fact, in the SIM method, the real-time multisequence residuals dataset L a is selected as L 1 . The   comparative residual dataset D a is selected randomly from L 2 . However, in the MPCA method, D a is selected from L 2 and L 1 is determined as the prior multivariate dataset L, which is applied to perform the SVD scheme. After generating feature vectors, the OC-SVM model is trained using an RBF with the RBF kernel s = 2 and regularization parameter v = 0:1. First, the validity of these two methods should be verified.
The faults discussed in Table. 2 are injected into the navigation sensors, and 600 groups of real-time samples for each fault are collected during the failure period. Figures 3 and 4 show the FD results of SIM + OC-SVM and MPCA + OC-SVM methods, respectively, where x-and y-axes represent the detection time and distance metric by the OC-SVM method, respectively. The obtained results reveal that both SIM + OC-SVM and MPCA + OC-SVM methods can successfully detect the faults with short delay time (DT). Moreover, for an abrupt fault, both SIM + OC-SVM and MPCA + OC-SVM methods can directly detect the fault without delay. However, for soft faults, a short DT of 4 to 11 s exists in the detection process, which is mainly caused by the insufficiency of error accumulation. Moreover, it is worth noting that there are some detection points that are below the detection threshold during the failure period. These points are presented in Figures 3(c), 4(b), and 4(d). However, this does not affect the detection effectiveness FD of the proposed method. Therefore, data of navigation sensors should be verified prior to the use. To this end, experiments are carried out to verify the validity of SIM + OC-SVM and MPCA + OC-SVM methods.

Comparison study with HS
Based on the foregoing section, it is found that SIM + OC-SVM and MPCA + OC-SVM methods can be effectively applied to obtain effective FD with a short time delay. In this section, it is intended to apply several OC-SVM FD methods for the navigation system to evaluate the FD efficiency of the HS. To this end, the OC-SVM method based on the phase space reconstruction (PSR + OC-SVM) 16 and the OC-SVM  where F . F th jl 6 ¼ 0 and F . F th jl = 0 denote samples whose distance metric exceeds the threshold and normal samples whose distance metric exceeds the threshold, respectively. The proposed methods, including SIM + OC-SVM and MPCA + OC-SVM, utilize the multi-sequence residuals to construct the learning features for OC-SVM. It should be indicated that the abnormal detection methods using continuous data have been the mainstream in the past decade. Similarly, MKAD + OC-SVM is also a data-driven method by using multivariate continuous data to detect anomalies, which is derived from multiple kernel learning. The resultant kernels can be constructed over discrete sequences and discretized continuous time series for OC-SVM constructing an optimal hyperplane. It should be indicated that the process of constructing kernels is to measure the similarity between the discrete sequences, in other words, to find the representation of time series, which is inversely proportional to the distance. This is similar to the proposed SIM + OC-SVM method. In the proposed method, it is intended to verify whether the MKAD + OC-SVM method can find the similarity in the multivariate residual sequences. However, the PSR + OC-SVM utilizes a single sample for detection rather than multi-sequence data. The reason for this comparison is that it is intended to test whether the faults can be detected by constructing features at one point. In other words, several dimensional features are constructed from time-series navigation signals for OC-SVM training in the PSR + OC-SVM method. However, in the detection stage, a sample point x will be mapped into the feature space.
In the simulation, 50 groups of the real-time multisequence residual datasets are included for comparison. The injected faults are selected from Table 2, and the corresponding parameter is set as according to the ''OC-SVM'' section. In other words, the length of the multi-sequence is a = 15. Fifty FD simulation experiments are carried out for each kind of fault. Table 3 presents the detailed FD results by utilizing all methods on the simulated navigation system. Moreover, the HS informs whether SIM + OC-SVM or MPCA + OC-SVM detect the outlier in the FD process.
The comparison study demonstrates that the proposed SIM + OC-SVM and MPCA + OC-SVM methods offer high FDRs and low FARs in contrast with PSR + OC-SVM and MKAD + OC-SVM, especially in soft faults FD. From the HS results in Table 3, the HS shows superior performance than any other method on FDRs with paying for higher FARs cost. The fourth and sixth columns of Table 3 show that FARs of SIM + OC-SVM and MPCA + OC-SVM methods provide the lowest false detection performance over all other methods with better FD results. In comparison, the proposed methods have superior characteristics in detecting accumulative error from successive residuals.

Study on residual sequence length
Since the parameter would make a difference to the results of FD, in this section the length of multi- sequence residuals is investigated. In order to consider both sensitivity and FDR performances, the appropriate length of the residual sequence should be selected for constructing features. The too-short sequence length would not be detected easily and the too-long sequence length would cause larger DT. Therefore, another simulation test is performed with different design parameters selected by SIM + OC-SVM and MPCA + OC-SVM. The soft fault of an accelerometer is determined to be injected to the system with a length of multi-sequence residuals ranging from 5 to 20. Table 4 summarizes the detailed FDRs, FARs, and time-delay indices of the simulation results.
According to the results of FDRs and FARs given in Table 4, it is observed that the different design parameters, including the length of the multi-sequence residual, significantly affect the FD performance of SIM + OC-SVM and MPCA + OC-SVM methods. It should be indicated that the length of the multi-sequence residual will generate a different accumulative error in learning features. Correspondingly, the performance of the HS will be affected due to different lengths. In the column of FDRs, all lengths of multi-sequence residuals obtain similar FD performance after exceeding 15 points, which means that sufficient length of the sequence is essential to ensure FD results. The FARs column shows that too long or short length would cause more false alarms. Therefore, an appropriate length (144l417 in the proposed simulation system) should be determined for a specific system. Moreover, the column DT indicates that the time delay will fundamentally become larger as the length increases.

Conclusion
In the present study, a hybrid data-driven FD strategy is proposed. The proposed strategy is based on multisequence residual analysis and OC-SVM, which is applied to navigation sensors. First, the basic data-driven fault diagnosis methods and their recent developments are reviewed. Then, the HS framework is presented and the FD is formulated as an outlierdetection problem. The SIM and modified PCA are adopted to construct the learning features in which the fault errors over a period of time can be accumulated. Moreover, OC-SVM is applied for implementing outlier detection by training the learning features.
Furthermore, the proposed strategy is validated on the simulated integrated navigation system. The training dataset is obtained on free-faults conditions and four typical faults are added to the simulation system. The experimental results show that both SIM + OC-SVM and MPCA + OC-SVM methods can detect the abrupt and soft faults with high accuracy in real time. The HS can improve the FD rate by paying a small false alarm cost. Furthermore, the selection of multisequence residuals length in SIM + OC-SVM and MPCA + OC-SVM methods is discussed. Compared with previous studies, the data-driven FD strategy is more efficient and accurate. In the near future, it is intended to validate this method on real navigation sensors and integrate it with other FD approaches to improve reliability and stability.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.