Quality-related batch process monitoring based on multi-way orthogonal signal correction enhanced total principal component regression

Batch process quality-related fault detection is necessary for keeping operation safety and quality consistency. However, the process variables have a weak ability to explain the quality variables makes the batch process quality-related fault detection a difficult task. In this work, a multi-way orthogonal signal correction enhanced total principal component regression (MOSC-ETPCR) is proposed to achieve the nonlinear quality-related fault detection of the batch process. First, after batch process data expansion, the orthogonal signal correction algorithm is used to filter out the quality-irrelevant information in process variables and avoid the influence of quality-irrelevant data on process modeling. Secondly, the nonlinear characteristics of the process are extracted by the maximum information coefficient matrix, and the quality-related nonlinear regression model is constructed to ensure the maximum correlation between the extracted features and quality variables. Thirdly, the statistics and corresponding control limits are established based on the obtained regression model. Finally, the effectiveness of the MOSC-ETPCR algorithm was verified by numerical simulation and the penicillin fermentation process.


Introduction
3][4][5] The multivariate statistical methods are becoming more and more mature in fault detection and diagnosis.As a guaranteed stage of the manufacturing level, quality-related process monitoring technologies have attracted widespread attention from industry and academia.But in actual industrial production, the measurement of quality variables always lags behind the detection of process variables.[8] The batch processes data are composed of batches, variables, and sampling points.Traditional fault detection methods require data preprocessing with multi-way unfolding tricks before modeling.Such as multiway PCA, multiway PLS, 9 multiway modified kernel slow feature analysis. 10Those methods are used to solve the static, linear, and single-phase, rather than dynamic, nonlinear, and multi-phase characteristics of the batch process.Therefore, Zhang et al. 11 proposed a method to explore the dynamic and static characteristics in process statuses identification.Hui and Zhao 12 proposed a method that considered the nonlinear and dynamic in batch process monitoring.Zhu et al. 13 proposed a dynamic time-slice multi-stage batch process monitoring method, dynamic behaviors of each phase can be captured from both single batch run and batch-to-batch evolutions.
To ensure operation safety and quality reliability, it is necessary to detect faults related to process quality.The quality-related fault can be detected by establishing a supervision relationship between quality variables and process variables. 14For the quality-related batch process monitoring, Peng et al. 15 proposed a hybrid method combining PCR and Bayesian inference to identify the fault.Jiang et al. 14 proposed a quality-related process monitoring method based on optimized sparse partial least squares.Peng et al. 16 proposed an over-complete independent component analysis method for non-Gaussian and non-linear quality-related batch process monitoring.Zhu et al. 17 proposed concurrent CCA to monitor processrelated and quality-related faults respectively.Considering the changes of statistical values or relationships between quality variables and process variables, a quality-related fault detection method based on the statistical model and regression coefficient 18 was proposed, linear regression coefficient and mutual information are used to establish the nonlinear and linear relationship between quality variables and process variables.Wang et al. 19 proposed a method to divide KPI-related process variables into linear and nonlinear parts by using OSC, and established a monitoring model for the nonlinear part.Wang and Jiao 20 proposed a method to divide process variables into quality-related and quality-unrelated subspaces and use TPCR for fault detection.These two methods are not sufficient to extract the nonlinear characteristics of process data.Zhao et al. 21proposed a quality weakly related fault detection method based on weighted dual-step feature extraction.However, the nonlinear characteristics of separated data are not considered.
In the detection of batch process quality-related faults, the above methods do not consider filtering out variables that are not related to quality in the process variables, in addition, the quality-related nonlinear characteristic of the batch process is not fully characterized.For the nonlinear batch process, this paper uses an orthogonal signal correction algorithm to filter out the components that are irrelevant to quality variables in the process variables, and then the maximum information coefficient square matrix is used to extract nonlinear features.Then, the regression model is established, and the process space is divided into quality-related and quality irrelevant subspace, and the statistics are established respectively.The kernel density estimation (KDE) is introduced to determine the control limit of fault detection.

Principal component regression (PCR)
The core of PCR algorithm is to extract the score matrix through principal component analysis (PCA) algorithm, and then establish the regression relationship between the quality matrix and the score matrix through least squares.Taken a set of data samples whose number of input variables is m and sample length is r, the input matrix can be expressed as , and l is the number of output variables.
After the process variable matrix X is preprocessed, the covariance matrix C = cov(x)' X T X m is calculated, and then the eigenvalue decomposition of C is performed and the eigenvalues are arranged in descending order as l 1 5l 2 5 Á Á Á 5l m 50.The first k eigenvalues in descending order are selected to form a new eigenvector matrix P = p 1 , p 2 , Á Á Á , p k ½ T with their corresponding eigenvectors.Subsequently, PCA algorithm is used to decompose the process variables, as shown in equation (1).
Where, X is the principal part of a process variable, E represents the residual part, B is the load matrix, T is the score matrix.The regression relationship between the score matrix T and the quality variable matrix Y is established through the least square method, the load matrix Q of the quality variable can be obtained by equation (2).
The predicted value Ŷ can be calculated by score matrix T and load matrix Q, as shown in equation (3).
Multi-way orthogonal signal correction enhanced total principal component regression algorithm (MOSC-ETPCR)

Preprocessing of batch process data
The three-dimensional data matrix X(I3J3K) of batch process is composed of batch I, variable J, and sample point K. First, the three-dimensional data X(I3J3K) is unfolded as X(I3KJ) and normalized, then, the X(I3KJ) is rearranged as X(KI3J).The data expansion diagram is shown in Figure 1.

Selection of quality-related features
Due to the existence of components with orthogonal to the quality variable Y in the unfolded batch process variables X(KI3J), therefore, the orthogonal signal correction (OSC) 22 algorithm is used to filter out the information that is orthogonal to the quality variable in the process variables, and the correlation between quality variables and process variables is enhanced.The qualityrelated model established by OSC has a stronger ability to explain the quality variables.First X(KI3J) and Y are standardized, and the matrix X is modeled by PCA to obtain the principal component vector t, and t is orthogonal to Y, as shown in equation (4).
Where I is the identity matrix PLS is used to calculate the weight vector v and get the new principal component vector as shown in equation (5).
We need to judge whether t new satisfies t new Àt k k t new k k \ 10 À6 .If t new does not satisfy this formula, it needs to return to equation ( 4) to orthogonally treat t to Y.If it is satisfied, the load vector p 0 = t 0 new X=(tt OSC ) is calculated.Finally, the quadrature signals are subtracted from X, and equation ( 6) is obtained: Equation ( 6) is the process variable that is nonorthogonal to the quality variable after being corrected by the quadrature signals.

Enhanced total principal component regression (ETPCR) algorithm
The process variables and quality variables corrected by the orthogonal signals are used to establish a PCR model.The covariance matrix used in the calculation of PCA decomposition can only measure the linear relationship between features and cannot measure other nonlinear relationships, the square of the maximum information coefficient (MIC) 23 matrix is used to replace the covariance matrix for nonlinearity feature extraction, which not only enhances the correlation between the extracted features and quality variables, but also solves the nonlinear relationship between variables.The MIC value of the two variables is shown in equation (7).
In order to verify MIC's measurement of variable correlation, the following variables are selected for further illustration.As shown in Figure 2, the variable x = linspace 0, 1, 800 ð Þ , variable y = sin (10 Ã p Ã x) + x, variable z is an random matrix z = randn(800, 1), x is the argument of y, x and z are unrelated.
MIC is used to test the correlationship of x and y, x and z.The result of MIC(x,y)=1, this shows that x and y have a strong correlation, and also can detect nonlinear correlation.The result of MIC(x, y) = 0.0448, this means that x and z are unrelated.The results are consistent with the actual situation, indicating that MIC has a good discriminant ability of variable correlation.
The maximum information coefficient matrix for the process variable corrected by the orthogonal signal is calculated, namely MIC(x oi , x oj ), as shown in equation (8).
Taken the square term of Q and marked Because Q is a real symmetric matrix, so Q T = Q.Then  for any vector b, there is , that is, the matrix H is nonnegative definite.Therefore, when the eigenvalue decomposition of H is performed, the eigenvalues l 1 , l 2 , Á Á Á , l p À Á and the corresponding eigenvectors are obtained.The eigenvalues are sorted from large to small, k satisfies sponding to the first k eigenvalues are taken to construct the projection matrix R = r 1 , r 2 , Á Á Á , r k ½ , and a new input sample is obtained as shown in equation (9).
According to (1), the principal component X i is decomposed, the score matrix T pc is obtained, and the least square regression relationship is established for the score matrix and the quality matrix, that is, the load matrix Q pc of the quality variables is: Then quality prediction matrix is: The total principal component regression processing method is used to continue to extract the part directly related to Ŷi in T pc .That is, PCA decomposition of Ŷi is performed again, and the new score matrix T ypc and load matrix Q ypc are shown in equation (12).
Least squares regression on T ypc and X i is performed, as shown in equation (13).
From the (1), the quality-related part is shown in equation (14).
The corresponding quality-independent part is shown in equation (15).
That is, process variables X i are decomposed into qualityrelevant parts X y i and quality-irrelevant parts X oi .

Batch process quality related fault detection based on MOSC-ETPCR
After the normal batch data is preprocessed, the orthogonal signal correction algorithm is used to filter out the components which are orthogonal to the quality variables in the process variables.Then the quality correlation relationship is established and processed by the total principal component regression method.Variables are decomposed into quality-related and qualityirrelevant subspaces.The T 2 statistic of the qualityrelevant subspaces X yi is shown in equation ( 16).
Similarly, the T 2 statistic of the quality-irrelevant subspace X oi is shown in equation (17).
Usually, the assumption of Gaussian distribution needs to be met when the control limit is calculated, but the actual process data is non-Gaussian, so the kernel density estimation (KDE) 24 is introduced to obtain the control limit T 2 s .Assumed that the sample data f g obey the overall distribution density f(x), given the kernel function K(x), the estimated density of the sample data is expressed as equation (18).
Where x is the sampled data, X i is the observation value, and n is the number of observation values.K(x) is the kernel function.Generally, window width has an important influence on kernel density estimation, and the optimal window width is related to sampling points, data distribution characteristics, kernel function selection, etc.In this paper, the method of optimal window width is adopted, namely Mean Integrated Squared Error (MISE) method. 25Here, the Gaussian kernel function 2 ) is selected, and the inspection level is set to a = 0:95, then the control limit is obtained by equation (19).
1 ns ) Offline modeling of batch process based on MOSC-ETPCR Step 1: Select a three-dimensional data training sample X(I3J3K), perform mixed expansion into X(I3KJ), and standardized; Step 2: Rearrange the standardized X(I3KJ) into X(KI3J); Step 3: Perform orthogonal signal correction processing for each process variable X to filter out the components orthogonal to the quality variable Y; Step 4: Calculate the square matrix of the maximum information coefficient for the corrected process variables, as shown in equation (9), establish the ETPCR model, and determine that the quality-related subspace X y i and quality-irrelevant subspace X oi ; Step 5: Construct quality-relevant and qualityirrelevant statistics T 2 y and T 2 o ; Step 6: Using KDE to solve the control limits of statistics T 2 y and T 2 o .

Online monitoring of batch process based on MOSC-ETPCR
Step 1: Collect online data, and use the mean and standard deviation of the offline modeling process for standardization; Step 2: Perform projection according to the ETPCR model established in the offline process; Step 3: Calculate T 2 y and T 2 o statistics of online data; Step 4: Monitor whether the statistics exceed the control limits.If the T 2 y is out of control limit, a qualityrelevant fault occurs, if the T 2 o is out of control limit, a quality-irrelevant fault occurs, if both exceed the control limits, it indicates that the fault has affected the process space and the quality space, otherwise, no fault occurs.
Figure 3 is the flow chart of quality-related fault detection based on MOSC-ETPCR.

Experimental verification and analysis
This paper introduces two cases to verify the effectiveness of the MOSC-ETPCR algorithm.One is a numerical simulation of nonlinear batch process, the other is the typical penicillin fermentation process.To prove the effectiveness of the proposed algorithm, the qualityrelated fault detection method of batch process, such as Multiway PLS (MPLS), multiway total principal component regression (MTPCR), multiway modified PLS (MMPLS), multiway expectation maximization partial robust m-regression (MEMPRM), 26 are selected.These methods are based on quality-related subspace and quality-unrelated subspace to establish regression models.MPLS, MMPLS, MTPCR, MEMPRM, and MOSC-ETPCR algorithm are applied to these two cases to compare and analyze the fault detection effects.

Numerical simulation example
In the experiment, eight process variables and one quality variable are selected as numerical simulation examples.The entire numerical system is expressed as equation (20).
Where, s 1 , s 2 ½ T is a latent variable subject to a uniform distribution of 0.01-2, and e 1 , e 2 , e 3 , e 4 , e 5 , e 6 , e 7 , e 8 ½ T stands for Gaussian white noise, and its standard deviation is 0.1.For each normal batch, small changes are added to simulate different batch runs.A total of 25 batch runs of process data are generated by simulating the numerical system 25 times repeatedly, and each batches have 200 sampling points.In addition to processing the training data generated under normal conditions, a ramp fault with a fault amplitude of 0.2 was added to x 2 at the 101st sampling point to verify the effectiveness of the algorithm.Because the quality variable y = x 1 + 2x 2 Ã x 3 + x 1 Ã x 4 + e 9 , therefore, the fault happened in x 2 affects the final quality variable y.
MPLS, MMPLS, MTPCR, MEMPRM, and MOSC-ETPCR algorithms are used in the numerical simulation example for fault detection, the fault detection result can be seen in Figure 4, the blue wave lines between 0 and 100 sample points represent normal process data, the red solid line represents the control limit, and the purple-red dashed line represents the fault detection effect after adding a fault.The fault detection effect of MPLS algorithm is shown in Figure 4(a).It can be seen that the T 2 statistic cannot detect the fault effectively in the whole process, while the SPE statistic detects the failure at the 127th sampling point, the fault detection has a certain delay and the fault detection rate is relatively low.Figure 4

Penicillin fermentation process
In order to further verify the effect of MOSC-ETPCR method, the typical batch process penicillin fermentation experiment was introduced to verify the fault detect effect. 27Generally, the penicillin fermentation process consisted by three stages, the first stage is the cell growth (1-50 h), the reaction time is short and the process is relatively stable.The second stage is the penicillin synthesis stage (51-290 h).The third stage is the autolysis stage (291-400 h), the reaction process is relatively stable and lasts a short time.In these three stages, the main factors affecting penicillin fermentation efficiency are dissolved oxygen concentration, substrate concentration, pH, and temperature.
The Pensim2.0 simulation platform 28 was used to test fault detection effect.Figure 5 shows the flow chart of penicillin fermentation.
The fermentation time of penicillin is 400 h per batch, and the sampling time is 1 h.Thirty normal batches are used for experiment.The observation data of 11 measurement variables are selected (as shown in Table 1), and the product concentration are the quality  variables, so the training data X(303103400) and Y(30313400) under normal conditions are obtained.Besides, six fault batches with different fault variables and amplitudes, such as the step and ramp faults of the aeration rate, the step and the ramp faults of the agitator power, and the step and the ramp faults of the substrate feed rate, are added to verify the effectiveness of MOSC-ETPCR algorithm.The specific fault types are shown in Table 2. MPLS, MMPLS, MTPCR, MEMPRM, and MOSC-ETPCR are used to compare and analyze the effect of fault detection.
Figure 6 is the fault detection effect diagram of MPLS, MMPLS, MTPCR, MEMPRM, and MOSC-ETPCR algorithms under fault 1. Fault 1 is a step fault with a fault amplitude of 6 added between 150 and 300 h.The T 2 and SPE statistics of MPLS algorithm under fault 1 are shown in Figure 6(a), the T 2 statistic monitors quality-related process variables, it can be seen that when faults are added at the 150th sampling point, quality-related faults can be quickly detected, but the control limit is exceeded between 0 and 40 and 300 and 400 sampling points.The SPE statistics could not detect a fault during the entire process.Figure 6      samples.Compared with step faults, ramp faults are difficult to detect in time due to slow changes in process variables.Figure 7(a) shows T 2 and SPE statistic diagram of MPLS, it can be seen that the T 2 statistic detects quality-related fault at the 332nd sampling point, and the SPE statistic cannot detect fault.Figure 7 of MOSC-ETPCR.The T 2 y statistic detects qualityrelated fault at the 226th sampling point, and the T 2 o statistic detects quality-independent fault at the 233rd sampling point, the fault detection effect is better than MPLS, MMPLS, MEMPRM, and MTPCR.
The MPLS, MMPLS, MTPCR, MEMPRM, and MOSC-ETPCR algorithms were used in the penicillin fermentation process for comparative verification.The results of the FDR and the FAR are shown in Tables 3  and 4. FDA and FAR are defined as follows: where, J represents the statistics, Jth represents the control limits.We can see that except for fault 5, the MOSC-ETPCR is superior to MPLS, MTPCR, MMPLS, and MEMPRM, it has the highest FDR and the lowest false FAR.For fault 1 and fault 3, the four algorithms can detect the fault in time, but compared to MPLS, MMPLS, MMPLS, and MEMPRM, the MOSC-ETPCR algorithm has a lower FAR.Fault 5 is the substrate feeding rate fault, we can see that the four algorithms cannot detect the fault directly.The reason for the delay in fault detection is that the glucose substrate feed rate propagates slowly between relevant variables.It can also be seen from the table that ramp faults 2, 4, and 6 are harder to detect than step faults, because the ramp fault cause the variables change more slowly.For the detection of ramp fault, the MOSC-ETPCR algorithm can detect the fault timely and accurately than MPLS, MMPLS, MMPLS, and MEMPRM algorithms.

Conclusion
In this paper, a quality-related fault detection method of batch process based on multi-way orthogonal signal correction and enhanced total principal component regression (MOSC-ETPCR) is proposed.First, after preprocessing the batch process data, the orthogonal signal correction algorithm is used to filter out the variables that independent with quality in the process variables.Secondly, the covariance matrix of the principal component regression is improved by the square matrix of the maximum information coefficient, which not only enhances the correlation between quality variables and process variables but also solves the linear and non-linear relationships among variables.Thirdly, relevant statistics are established and KDE is used to obtain its control limits for quality-related fault detection of the batch process.Finally, a numerical simulation and the penicillin fermentation process verify that the MOSC-ETPCR algorithm has a higher FDR and a lower FAR in quality-related fault detection of the batch process.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Hybrid expansion of 3D data in batch process.

Figure 3 .
Figure 3. Flow chart of quality-related fault detection based on MOSC-ETPCR.

Figure 5 .
Figure 5. Reaction flow chart of the penicillin fermentation process.

Figure 7
is the fault detection diagram of MPLS, MMPLS, MTPCR, MEMPRM, and MOSC-ETPCR algorithms under fault 4. Fault 4 is a ramp fault with a fault amplitude of 0.5 added between 150 and 400

Table 1 .
Process variables selected in the fermentation process of penicillin.

Table 2 .
Types of faults added in the fermentation process of penicillin.
(b) is T 2