On the application of artificial neural networks for the prediction of NO x emissions from a high-speed direct injection diesel engine

This article considers the application and refinement of artificial neural network methods for the prediction of NO x emissions from a high-speed direct injection diesel engine over a wide range of engine operating conditions. The relative computational cost and performance of two backpropagation algorithms, Levenberg–Marquardt and Bayesian regularization, for this application are compared, with the Levenberg–Marquardt algorithm demonstrating a significant cost advantage. This work also assesses the performance of two alternative filtering approaches, a p-value test and the Pearson correlation coefficient, for reducing the required number of input variables to the model. The p-value test identified 32 input parameters of significance, whereas the Pearson correlation test highlighted 14 significant parameters while additionally providing a ranking of their relative importance. Finally, the article compares the predictive performance of the models generated by the two filtering methods. Overall, both models show good agreement to the experimental data with the model created using the Pearson correlation test showing improved performance in the low-NO x region.

Introduction NO x emissions from diesel vehicles are tightly legislated and, as a consequence, almost all modern diesel vehicles use a range of bulky, complex and expensive aftertreatment devices to reduce tailpipe NO x emissions to near zero. 1 Accordingly, there is a strong incentive to reduce engine-out NO x emissions to lower the aftertreatment burden and to also improve 'cold-start' performance. 2 NO x formation occurs during the combustion event typically at temperatures higher than 2000 K, and when the local air fuel ratio (AFR) is weak of stoichiometric. 3 A number of engine parameters affect engine-out NO x emissions including humidity, inlet air temperature, exhaust gas recirculation (EGR) rate, fuel injection pressure, global AFR and injection timing. 4 In addition to these parameters, combustion chamber geometry and the details of the fuel-air mixing also have an essential effect. 5,6 Given the complex nature of NO x formation in diesel engines, developing a predictive model to understand the relative importance of all of these parameters is of significant interest. Due to the highly transient nature of diesel engine combustion, predictions of NO x emissions need to account for many interacting processes. The coupling between these processes makes it more challenging to isolate/identify the leading parameter or the combined effects of various parameters for NO x emissions. Different conventional approaches have been used to model the system, including multivariate analysis, linear correlations and some symbolic regression algorithms. 7,8 However, for conventional methods, a prior knowledge of the process relationship is needed for accurate modelling, which is a great challenge for a stochastic, complex Department of Engineering Science, University of Oxford, Oxford, UK and nonlinear combustion process. In recent years, emerging artificial intelligence technologies have proven to be effective in a wide range of scientific fields involving nonlinear processes. In engine research, studies have favoured artificial neural networks (ANNs) as the predictive modelling tool. Compared to other predictive nonlinear models, the main advantage of ANN lies in its ability to identify cryptic, nonlinear, highly complex correlations, between the measured input and output data. The modelling process does not require any governing equations for the parameters to be predicted, therefore substantially reducing the time and cost associated with engine development and model building. Being a machine learning tool, ANN has the ability to re-learn when new data are available, which can further increase the model's accuracy. Additional input and output variables can also be added/removed if necessary in the ANN.
Researchers have used ANN in the field of engine design and combustion characteristics studies. ANN has been used for diesel engine controller design, 9 mechatronic control system design 10 and idle speed regulation control design. 7 Various researchers have also looked into the application of ANN in combustion characteristics, including the reduction of dimensionality in tabulated chemistry 11 and combustion control. 12,13 A number of studies have also been conducted to predict the emission and performance of internal combustion engines (ICEs) using the ANN approach. Deng et al. 14 analysed the effect of cetane number on exhaust emissions with a neural network. It was shown that a back-propagating neural network was able to determine the functional relationships between total cetane number, base cetane number and cetane improver as well as total cetane number and nitrogen content and hydrocarbons (HC), carbon monoxide (CO), particulate matter (PM) and NO x . Parlak et al. 15 examined the feasibility of using ANN to predict specific fuel consumption and exhaust temperature for a diesel engine. Mauro et al. 16 investigated the ability of neural networks to accurately model the indicated mean effective pressure (IMEP) and its coefficient of variation (COV of IMEP) in a spark-ignited ICE. A large number of the experimental dataset was used to construct the model giving a strong correlation between the modelled COV and the experiments. However, a systematic overprediction of COV was observed for low COVs while higher COVs were underpredicted by the ANN model. The systematic behaviour is likely caused by missing physical parameters for the ANN input. Additional studies were also carried using ANN to characterize emissions for diesel engines using engine operating parameters. [17][18][19][20] The existing literature suggests ANN being a powerful tool capable of identifying the complex correlation between engine operating parameters and NO x emissions within the range of experimental test conditions. However, little or no investigation is given on how to identify the required engine operating parameters for the ANN model.
In this study, we further explore the applicability of the ANN method for the prediction of NO x emissions under a wide range of engine operating conditions. The dataset includes 7 months of engine testing from the University of Oxford single-cylinder diesel research engine. The engine and its test cell have been designed to give the highest quality data, [21][22][23] and much of this dataset has already been published. 4,22,24 Overall the dataset consisted of 1108 individual experiments. Based on this comprehensive experimental dataset of a single-cylinder direct injection diesel engine, we first quantitatively evaluate if neural networks can accurately predict the NO x emissions within the training operating range using all input parameters. A specific focus was given to the predictability of the trained ANN model in capturing the effect of EGR and other parameters on NO x emissions outside the training dataset range. Multivariate statistical model analysis is then performed to identify the statistically most significant input parameters. The significance of controlling operating parameters on NO x is given. Finally, we evaluate the ANN's ability to predict NO x while eliminating the lesser-weighted input operating parameters. The reduced model accuracy is validated against experimental conditions that are not part of the training dataset and compared against the original ANN model which includes all input parameters.

Engine and instrumentation
The engine used in this study was a single-cylinder direct injection diesel engine with the bottom end based on a Ricardo Hydra research engine. EGR was achieved via a high-pressure EGR system, with the exhaust gases passing through an EGR cooler prior to entering the inlet manifold where mixing with the fresh intake charge took place. A more detailed description of the engine peripherals and configuration can be found in previous publications. 5 Table 1 presents the details of the test engine.
In-cylinder pressure data were measured using a Kistler piezoelectric transducer and logged with a highspeed data acquisition unit at a resolution of 0.1°CA. Low frequency channels were logged at 1 Hz using a CADET engine control system by Sierra-CP engineering. Fuel flow measurements were carried out using

Test conditions and training set
For each test point, the low-speed data were logged for 180 s while the high-speed data were logged for 300 cycles. The high-speed data were also logged by the CADET system (low-speed data acquisition) with their values updated every 300 cycles. This allowed for a test file that integrated both low-and high-speed data. The engine was operated over a wide range of speed/ load conditions by performing five-point EGR sweeps, except under full-load conditions. Overall, seven test points were considered, defined by a specific speed/load combination. For each speed/load condition, boost pressure, exhaust back-pressure, rail pressure and inlet temperature were also varied, resulting in a much wider test envelope. The maximum EGR rate was set by a smoke limit, the same value independent of test conditions. The EGR rate was then reduced at equal steps to give a five-point EGR sweep for each test case. Table 2 presents the different test points along with the range of the parameters of interest that were tested. Each test point comprised 180 individual logs (one point per second) which resulted in a total of '128,000 points after data pre-processing.
Data pre-processing was carried out prior to model training in order to remove any outliers that would skew the results. This included points that varied more than the acceptable limit for the target IMEP (60.2 bar) and speed (620 r/min). This ensured that any operator errors were removed. In addition, for the lowspeed/low-load conditions, under maximum EGR rates, the engine was found to enter a low-temperature combustion regime, indicative of very low NO x ('10 ppm) and smoke values (0.5 FSN), which is very susceptible to combustion instability. Consequently, EGR values above 60% were removed from the training set as they showed very high variance in IMEP and NO x . This is expected to increase the model's accuracy even under low-NO x values. For reasons of commercial confidentiality, all fuel flow and emissions results have been rescaled by an arbitrary value (and hence are presented in arbitrary units).

Data uncertainty
As with every model, the accuracy of the modelled results will be very much dependent on the experimental inputs. The variance observed on the number of samples taken is due to either unexpected sources of error which are random in nature or inaccuracies associated with the measurement equipment. Since inaccuracies associated with the measurement equipment (usually referred to as bias error) can be accounted for by calibration, this section focuses on uncertainties that are random in nature. As already mentioned, each test point was logged for 180 s by the low-speed data acquisition system and repeated several times over different days to remove any bias error. An indication of the associated uncertainty for each test point was given by the 95% confidence level. Figures 1 and 2 show the scaled fuel flow rate and the normalized NO x for the 1500 r/min/3.8 bar nIMEP test point. This point is considered to be representative  of the worst case scenario in terms of experimental uncertainty as the signal to noise ratio is expected to be the highest. The results presented here are averaged over three runs, and as can be seen, the associated uncertainty is very small, indicating a high confidence level in the data. It is worth noting that any uncertainties associated with nIMEP and the rail pressure, both parameters used to define a given test point (Table 2), are expected to be reflected into the fuel flow rate measurement. As a matter of fact, the associated uncertainty for both nIMEP and the rail pressure was measured to be less than 0.3% at the 95% confidence level for the results shown in Figures 1 and 2. Equally, measured NO x is expected to capture any uncertainties related to the intake and exhaust back-pressure as well as to the intake temperature since these parameters have an effect on the EGR rate and consequently on NO x .

Neural network design
This section presents the fundamental theories behind the neural network configuration employed as well as details regarding the network's setup for accurate NO x prediction. Neural network originates from the attempt to find numerical representations of information processing in biological systems. Over the past few decades, the neural network has been developed and implemented in many different industries for statistical pattern recognition. 25 There are several neural network structures depending on the nature of the problem at hand. Detailed discussion on the various configurations is outside the scope of this work. In this study, a commonly recognized and used ANN structure, multilayer perceptron (MLP, sometimes loosely referred to as the feedforward ANN), was constructed. MLP uses a supervised learning method called backpropagation to train the neural network model. A typical structure for this type of ANN can be seen in Figure 3. It consists of one input layer, a number of hidden layers and one output layer. With the exception of the nodes in the input layer, each node is known as a neuron. In order to help introduce nonlinearity into the network, an activation function is needed for the neurons in the hidden and output layers. A neural network model can be seen as a nonlinear function from a set of input variables x i and a set of output variables y k , controlled by various adjustable parameters. To get the best prediction by the network, many parameters need to be adjusted, including biases, weights, number of neurons and type of activation function. In the following section, the construction of the neural network will be detailed.

Neural network background
For the neural network in this study, the activation function is chosen to be a continuous differentiable log-sigmoid function given by The neural network function can be written as Following the sign convention, parameters w (2) kj , w (1) ji are referred to as the weights and parameters w (1) j0 , w (2) k0 are referred to as biases. Vector w represents the   (1) indicates that the corresponding parameters are in the 'first layer' of the network. Here, k = 1, . . . , K and K is the total number of outputs. The subscript (2) corresponds to the second layer of the network. In order to find the optimal value w for the model, we need to minimize the error of network. The error function used here is the mean square error (MSE) function given by where y(x n , w) is the model output and t n is the coresponding target output. There are different ways to minimize the error function. For a feedforward neural network structure, backpropagation learning algorithms are widely used. In this study, two variations of backpropagation algorithms are applied and compared. The first algorithm is a nonlinear numerical optimization technique, called Levenberg-Marquardt (LM). 7 LM is a secondorder method providing faster and more accurate solutions owing to its inclusion of second-order derivative of error information. The LM algorithm has been successfully used for training in engineering applications with significant advantages over other methods when given a median-sized dataset. 9,19 The second backpropagation algorithm explored in this study is the Bayesian regularization backpropagation algorithm. In this algorithm, the weights and biases of the network are set to random variables with specified distributions. Regularization parameters related to the unknown variances associated with these distributions are added to the error function 3.
where E(D) is the sum of network weights, that is, E(D) = P ij jjw ij jj 2 . The two parameters a and b control the weighing of the two parts E(w) and E(D). For a ( b, the network will minimize the error, without really trying to keep weights low. For a ) b, the network will minimize the weights, allowing for some more error. In reality, this means a large a will stop the network from overfitting, which leads to a better generalization at the expense of a larger training error. The estimation of these parameters can be achieved with the use of Bayesian regularization method, which is beyond the scope of this article. A detailed discussion of the use of Bayesian regularization, in combination with LM training, can be found in Dan Foresee et al. 26 As can be seen from equations (3) and (4), the difference between the LM algorithm and the Bayesian regularization backpropagation algorithm is that one minimizes the MSE (LM) whereas the other minimizes the weighted sum of squared errors and squared weights (Bayesian). The latter is typically used to prevent overtraining and overfitting of the network without using a validation subset. Therefore, it is most useful for small dataset to keep the training subset as large as feasible. In this study, both methods were explored for the construction of the neural network.

Neural network configuration
A common issue with training a neural network is overfitting, where the model matches the training data very closely but fails to give accurate predictions for a new dataset. A common way to avoid this is to split the dataset into different parts. Depending on learning algorithms, the data can be split into either three parts for training, validation and testing (LM approach), or two parts for training and testing only (Bayesian regularization approach). The training and testing datasets should include representative samples of all the cases the network handles. For this work, the data were split as follows; training: 70%, validation: 10% and testing: 20% for LM approach and training: 70% and testing: 30% for Bayesian regularization approach. The data selection for each subset was carried out randomly to remove any training bias.
Due to a large amount of data that were used for training, there could be a possibility that test points with similar test conditions would be part of all three subsets mentioned above. This would then have an effect on how well the model can generalize for test points that have not been trained before. Consequently, two separate test cases (listed as verification dataset), in terms of load and engine speed, were removed from the dataset and later used to independently evaluate the performance of the model as shown in Table 2 and Figure 4.
Following the accepted convention, the training of the network terminates when the MSE for the testing dataset reaches an inflection point and starts to increase. It is therefore important that the testing dataset is not introduced to the training procedure. Different parameters need to be considered when building a predictive neural network model. The present study was initialized with 87 input parameters and with 1 output parameter (i.e. NO x emissions). These input parameters were initially chosen based on experience and without any quantifiable metric. Later sections discuss possible metrics for parameter reduction. For a successful feedforward neural network study, the number of hidden neurons is critical for the model accuracy. Although there is no absolute rule in identifying the right number of neurons, a widely accepted method is using the empirical formula below 27 where I and O are the numbers of input and output parameters. P i is the number of training patterns available in the system. Using the above-mentioned equations, the optimal hidden neurons required, considering the 87 input parameters mentioned above, is 44. However, these results were further backed by a neuron sensitivity test, where the MSE for a range of number of neurons was estimated. Figure 5 shows the MSE results over a range of neuron numbers and using the LM learning algorithm. The MSE is seen to decrease with an increasing number of hidden neurons at an early stage. This reduction in MSE is explained by the enhanced ability of the ANN model to solve nonlinear and complex problems as the number of neurons increases. After the first reduction in MSE, several fluctuations are observed. This is likely due to the additional adjustment needed for the same training information when more neurons are introduced. The MSE is seen to be the smallest for all three sets of data near 45 hidden neurons. A fluctuation of MSE is again seen when hidden neurons are more than 50, which is likely caused by overfitting. A similar study was also performed for the Bayesian regularization method. For the accuracy and computational efficiency of this study, 45 neurons were initially chosen with the neuron number being adjusted depending on the number of input parameters to the model. When a reduction in the number of the input parameters was carried, a similar process to the one showed in Figure 5 was performed in order to identify the best compromise between accuracy and training time. Finally, throughout this work, a single hidden layer was employed.

Results and discussions
In this section, simulation results of the chosen neural network configuration will be discussed. First, the correlation between simulation data and experimental data will be presented for both the LM algorithm and the Bayesian algorithm. A comparison between two algorithms in terms of model accuracy and computational cost will be detailed. The chosen algorithm will be used as the predictive model to predict NO x emissions out of the training operational range; subsequently, a p-value test will be conducted on the training data to identify the key controlling parameters in predicting emissions. The effect of major controlling parameters on NO x will be highlighted. Finally, a reduced input variable ANN model will be presented using statistically more significant controlling parameters.

NO x predictions using ANN model
In order to determine the performance of the model, we considered both the correlation coefficient (r) and coefficient of determination (R 2 ). The Pearson correlation coefficient indicates how well two parameters X and Y are correlated based on a straight-line model. Its value can range from 21 (i.e. an increase in one parameter leads to a decrease in the other) to 1 (i.e. an increase in one parameter leads to an increase in the other) with a value of 0 indicating that the two variables are not linearly correlated. The definition of the coefficient of determination and Pearson correlation are where n is the sample size of given paired data (x 1 , y 1 ), . . . , (x n , y n ).
x and y are the sample mean for variable x and y where t i is the experimental output and O i is the model output. Figures 6 and 7 show the performance of the model predictions for both ANN algorithms. Here, the correlation coefficient is used to describe the relationship between two or more parameters. Both algorithms show a very high correlation coefficient with all data points clustered near the unity slope line. The results also demonstrated the ability of ANN in predicting experimental observations for a wide range of operating conditions. Although no significant difference is seen in the prediction accuracy between the two methods, the computational time required for these two models is significantly different. Table 3 shows the CPU hours required for building the ANN model using these two algorithms (the CPU used in this study is Intel(R) Xeon(R) CPU E5-2680 v3). Therefore, the LM algorithm is chosen as the learning algorithm for further studies.
As previously described, testing data are randomly selected and separated from the training process; Figure  8 shows the absolute error for the LM ANN model predictions in the testing subset. As seen from the graph, the majority of the points are clustered within the band of (210, 10) in terms of absolute error which shows the good predictability of the model. Although the absolute value for all testing points presented so far is very close to the experimental accuracy, one could argue that the randomly selected testing points are relatively close to the training dataset. Therefore, the capability of the current, rigorously built ANN model was examined on other independent datasets whose conditions differ substantially from the training data conditions (identified as test points 6 and 7 in Table 2). Figure 9 shows the accuracy of the ANN model for the average NO x predictions using the independent verification tests. It is worth mentioning again that each verification point on the graph consists of 180 engine data points. A complete EGR sweep is performed for each verification test, which covers a wide range of NO x values. The figure shows a high correlation between the predictions and the experimental observations with a coefficient of determination (R 2 ) of 0.9655.
For a more detailed analysis of the ANN accuracy based on 87 control parameters, Figure 10 shows the mean absolute error for the verification test points. The error is seen to be smaller at higher NO x values and increases closer to the low-NO x region. This is likely caused by the sensitivity of the ANN model at lower target values when a large number of input parameters exists. It is also worth noting that measuring low-NO x   emissions is also subject to higher variations in experimental measurements. It follows that it would be of great value to determine the significance of each input parameter to NO x and remove any parameters indicating low-NO x correlation. A reduced model can then be built requiring less computational time and having less perturbation in the system. As described, the ANN model used to this point of this study involved the use of 87 input parameterschosen on the basis of the authors' prior experience on ICEs. The computational time required to build this model is relatively high due to a large number of data, and small errors can be introduced from statically insignificant input variables in the data. For the LM algorithm, with the current dataset, a minimum of 70 CPU hours is required. Moreover, the large number of input variables can also have an impact on model complexity, learning difficulty and performance of the model. Therefore, it is preferable to strategically decrease the size of the neural network by eliminating statistically insignificant input parameters; therefore, increasing the computational efficiency and reducing the errors introduced by input parameters.
For any statistical methods, the identification of suitable input is largely dependent on the discovery of the input relationships within the available data. For parametric or semi-parametric empirical models, the a priori assumption of the functional form of the model based on the physical interpretation of the system can help select the appropriate inputs. However, the ANN model is developed based on input variables given in the available data with no prior model structure. The nonlinearity, inherent complexity and non-parametric nature of ANN make it difficult to apply many existing analytical variable selection tools. A desirable input variable for ANN needs to be highly informative and dissimilar to other inputs (independent). The main challenges of selecting inputs in ANN lie in (1) a large number of available variables, (2) the inter-correlation between input variables, which creates redundancy, and (3) the limited predictive power in certain input.
The approach for selecting input variables can be generally categorized into three main classes: wrapper, 28 embedded 29 or filter algorithm. 30 For the wrapper and embedded algorithms, the input selection process is to coincide with the ANN training process either being part of the optimization process or directly incorporated into the training ANN algorithm. The filter algorithm, on the other hand, distinctively isolates the input variable selection process from the ANN training where auxiliary statistical analysis technique can be used to measure the relevance of individual, or combinations of input variables. The use of the filter method gives a model free approach for distinguishing the important variables. A typical filter approach for input variable selection is shown in Figure 11, where an incremental search strategy is used to evaluate each candidate-output relationship. In this study, we focused on commonly used filter strategies to study the   controlling parameters for the optimal ANN algorithm giving the best NO x predictions.

P-value test
Prior to the input variable selection process, the initial 87 parameters were screened for duplicates which were then removed. This resulted in a total of 41 parameters remaining and removed unwanted bias on the model's output. Initially, the input parameters to the model were tested through a p-value test, using a linear regression model. In statistical hypothesis testing, the p-value indicates, for a set of observations, the probability that the null hypothesis is true or not. More specifically, in our case, the null hypothesis indicates that the selected input parameter is not linearly correlated with NO x . In practice, a low p-value indicates that the two parameters are more likely to be correlated. In this work, a significance level of 5% was used, which means that if the p-value is smaller than 0.05 the two parameters tested (i.e. input parameter and NO x ) have a 95% chance to be linearly correlated.
For the number of input parameters used in this study, a correction method was needed to counteract the problem of multiple comparisons. Here, the Bonferroni correction is used where the input parameter is seen as statistically significant when the p-value test is less than p = 0.05/41. Studies have shown that with a sufficiently large example, a statistical test will almost always indicate a significant difference unless there is absolutely no effect. 31 For example, if the sample size is 100,000, a significant p-value is likely to be found even if the difference in outcomes between variables is negligible. Thus, while a p-value test can inform the reader whether an effect exists, the p-value will not reveal the size of the effect and p-value will be considered as confounded unless a suitable sample size is chosen. In order to have more representative statistics for the input parameters, in this study, the test dataset was sampled across the entire training set with a randomly selected 1000 discrete samples. A p-value test was then performed on each of the test dataset. A total of 100 pvalue tests were performed within the training data. Figure 12 presents the results for 100 p-value tests. The frequency of each parameter passing the test is given. The results show that among 41 input parameters around 14 parameters pass the p-value test 90% of the time showing statistical significance to NO x emissions and around 15 parameters failed to pass the test 90% of the time showing minimum statistical significance to NO x emissions. Also shown in Figure 12 are the top 15 parameters passing the most p-value test, meaning they are more likely to correlate to NO x emissions linearly. This process showed that nine parameters completely failed the p-value test and consequently were removed from the training set. This resulted in a total of 32 parameters remaining.  Removing parameters that are uncorrelated to NO x is expected to reduce training times as well as remove any bias from the model. The model results for the verification dataset, using these 32 parameters with 16 neurons (the number of neurons used first is determined using equation (5)), can be seen in Figures 13  and 14 where they are compared against the original 87-parameter model. First, a reduction in model error is observed in the mid-range of experimental NO x results. Second, the variation in low-NO x prediction is significantly reduced with the model results being clustered around the 100 NO x units region. This resulted in an improvement in the R 2 value of 2%. Overall, an improvement in the low-NO x prediction region is observed without affecting the high-NO x predictions.
From equation (5), the optimal number of hidden neurons required, considering the 32 input parameters mentioned above, is either 15 or 16. The MSE over a range of different neuron numbers was calculated to assess this, as shown in Figure 15. A similar process to the one in Figure 5 was used. The test was initialized with 24 neurons, and their number was reduced in increments of 2 for a total of 11 cases. The variation in MSE for 12 neurons and above is less than 10%. It should be noted that this is the average squared error across all the test points, so this variation is actually small. It is also worth noting that the training time when including more neurons also increases substantially. Therefore, the number of neurons was chosen as 12 in order to provide the best compromise between training times and model accuracy.
ANN models with different number of neurons applied to the verification dataset are also presented. Figures 16 and 17 compare the verification results for different number of neurons. It can be seen that the 12neurons case presents the smallest error in the low-NO x region without compromising the model's accuracy in the high-NO x region (as opposed to the 4-neuron case). A high R 2 value of 0.973 is also given with a 12-neuron model. A clear deviation of unity slope is given when four neurons are used indicating there could be an underfitting issue in the training sets. Minimal accuracy is gained when 16 neurons are used compared to 12 neurons in the build model.
Overall, reducing the number of parameters feeding into the model and optimizing the number of neurons used as previously discussed resulted in reduced training times without compromising accuracy. At this point is worth noting that the resulting training time will depend on various parameters, most notably on computational power, data size and the type of training algorithm. As an indicative value for the results presented here, reducing the parameter size from 87 to 32 resulted in ;20 times fewer training hours being required.
The p-value test has enabled us to build a model that has 32 inputs which greatly reduced the computational   time required. However, the number of inputs for ANN training is still significant given a large amount of experimental data. In the following study, we performed further input parameter reduction based on parameters' passing frequency rankings in p-value test. The purpose of this test is to show whether the p-value test is capable of distinguishing critically important input parameters from those that do not significantly affect the results. We strategically removed an increasing number of input parameters based on their p-value test passing frequency and retrained the ANN network. As seen in Figure 12, 14 input parameters passed p-value test 90% of the time. We, therefore, focused on the remaining 18 parameters in terms of input variables reduction. Table 4 shows the input parameter deletion process with two parameters reduction increment each time. The passing frequency of the p-value test for each parameter is also listed. In order to quantify the significance of the deleted parameters, the newly built, reduced, ANN model is applied to the verification data. The new models' MSE on the verification data is then compared with the 32-parameter MSE, which is considered as the baseline case. Figure 18 shows the effect of deleting input parameters on the verification MSE. The increase in verification MSE is measured by the ratio of the reduced model MSE over the 32-parameter model MSE. The MSE is only kept at around the same level when two parameters are deleted. Deleting more than two parameters showed a bigger MSE increase compared to the 32-parameter model. A greater fluctuation is seen when six or more parameters are deleted. Even with the deletion of IMEP, which only passed four times, the ANN model's predictions significantly deteriorated with MSE increasing more than double compared to the baseline case. These results show that using the p-value to inform on the importance of certain parameters over others can be misleading irrespective of passing frequency. Therefore, an alternative filter method is needed if further parameter reduction is required.

Pearson correlation coefficient
The previous sections showed the use of the p-value test in determining the model's parameter importance and its shortcomings. The caveat of using a p-value test for input variable selection is that with a sufficiently large sample size, as in this work, it will almost always mark a parameter as important, unless there is no correlation at all 31 (i.e. the nine parameters deleted in Figure 12). Furthermore, a p-value test does not indicate how important an input parameter is (rather if it is important or not) and the previous section showed that the passing frequency is not a reliable metric of parameter importance. Consequently, in this section, we will discuss the use of a Pearson correlation coefficient as a filter algorithm for highlighting important parameters and reducing the input variables further. The correlation coefficient between each input parameter and NO x was calculated and ranked in order of absolute value. A common approach in highlighting a parameter's relevance is when the correlation coefficient is greater than 2= ffiffi ffi n p , 32 where n is the number of parameters tested. Thus, for the 32 parameters used, this threshold value was set to 0.3535.
When this metric is used, the results, in terms of parameter importance, are significantly different from the p-value test. Overall, from the 32 parameters passing the p-value test, 14 passed the Pearson correlation threshold (i.e. 0.3535); however, these were not the same parameters highlighted as the most important by the p-value test. Figure 19 shows the parameters passing the Pearson correlation threshold, marked in red, which are overlaid against the p-value test results in blue. Interestingly, the Pearson correlation method highlights parameters across the whole range of p-value test results, irrespective of passing frequency. Most notably, four parameters passing 90% of the p-value tests were not highlighted as important whereas five parameters with a passing frequency of \ 50% passed the Pearson correlation threshold mentioned above.
These results help explain the findings in Figure 18. When the first two parameters are deleted, the relative MSE stays fairly unchanged (close to 1). However, when the next parameters of lowest importance, according to the p-value test, are removed the error is almost  tripled. This is due to the fact that gross indicated mean effective pressure (gIMEP) has been deleted, which has been flagged as important according to the Pearson correlation threshold. Similarly, when the bottom six parameters are removed, the error increases even further since volumetric efficiency was shown to be the second most important parameter according to the Pearson correlation test. These results can be seen in Table 5.
The results in Table 5 show, perhaps not surprisingly, that EGR has the highest negative correlation coefficient with NO x , an effect that is very well understood and widely covered in literature. 33 These results also show that volumetric efficiency is the second most correlated parameters to NO x . By definition, volumetric efficiency indicates the amount of fresh charge entering the cylinder in relation to the cylinder's swept volume. The higher the volumetric efficiency, the more fresh charge, and thus more oxygen, is available during combustion and consequently, since NO x formation is both dependent on temperature and oxygen availability, NO x will increase all other things being equal. In the context of this work, volumetric efficiency can also be considered as a measure of charge dilution (i.e. presence of EGR gases in the cylinder) since EGR gases will replace part of the fresh charge entering the cylinder, thus reducing volumetric efficiency. Similarly, the mass flow rate of inlet air (as well as the exhaust gases due to continuity) is also highly correlated to NO x since an increase in the mass flow rate of inlet air would result in a higher oxygen concentration in the cylinder. Table 5 also shows that the inlet and EGR cooler outlet temperatures indicate a negative correlation to NO x . This result can be attributed to thermal throttling where an increase in temperature leads to a reduction in charge density and consequently in a reduction of oxygen availability, thus reducing NO x emissions. On the other hand, a higher inlet temperature can also lead to higher charge temperature at the time of the inlet valve closing, thus increasing the peak cylinder temperature and consequently NO x . Laddomatos et al. has shown that these two competing effects tend to cancel out, leading to minor changes in NO x . However, due to the nature of the tests included in the dataset, the inlet and EGR cooler outlet gas temperatures are directly linked to EGR levels (i.e. increased EGR flow leads to a higher temperature) which explains their negative correlation with NO x .
In addition, various parameters related to engine load also passed the Pearson correlation test. Peak cylinder pressure (P max ) was highlighted as the parameter with the highest importance, in agreement with its well-known correlation to peak cylinder temperature and thus thermal NO x formation. 34 These results are also supported by the work of Leach et al. 4 who showed that peak cylinder pressure also correlates very closely to NO x in a diesel engine. In their work, they looked into various combustion-related parameters for NO x correlation; however, IMEP was the only parameter with a comparable, yet smaller correlation to P max , a result which is also observed in Table 5.
Finally, a direct effect of higher peak cylinder temperatures is the increased heat transfer to the cylinder walls. As a result, the temperature change in the coolant across the cylinder jacket and cylinder head is expected to rise due to increased energy dissipation to the coolant. This then explains the correlation of the coolant temperature difference across the cylinder head and cylinder jacket with NO x emissions, presented in Table  5. The measurement location for all temperature-related parameters is shown in Appendix 1.
The resulting 14 parameters from the Pearson correlation test were then used to construct an ANN model and compared against the 32-parameter model. Seven neurons were used following the procedure described earlier. Figure 20 shows this comparison for the 0%-15% EGR range of the verification results. It can be seen that using the 14-parameter model, resulting from the Pearson correlation test, has resulted in improved NO x predictions at the 12%-14% EGR range without compromising the low-EGR region. Overall, the error in NO x prediction is less than 12%. At the 15%-35% EGR range, the 14-parameter model also outperforms the 32-parameter model, as it can be seen in Figure 21. Despite one point where both models are struggling to predict, the 14-parameter model leads to a smaller prediction error for all points tested. It is also worth noting that both models are capable of accurately predicting very low NO x values.
The findings in Figures 20 and 21 further highlight the aforementioned challenges of solely using p-value as a method of input variable selection. Despite the equally good performance of the model constructed using the p-value test results (i.e. 32 parameters), further reducing the input variables to the model can remove any unwanted bias as well as reduce training times. This reduction in redundant variables could explain the improvement in model prediction at very low NO x values with the 14-parameter model.

Conclusion
This work presented the use of an ANN model to predict NO x emissions from a high-speed direct injection diesel engine using a comprehensive dataset of 110,000 test points for training. The dataset included a wide range of engine speed and load conditions, over a broad range of EGR rates, thus allowing for the implementation of ANN in real-life operating conditions. Initially, two backpropagation algorithms, namely LM and Bayesian regularization, were compared in order to identify which approach produced the best training performance while keeping the training time as low as possible. It was found that both algorithms are capable of accurately predicting the target NO x values; however, for the current dataset, Bayesian regularization resulted in a 10-fold increase in computational power and as such was not considered further.
A strong focus was given on reducing the input variables to the model during this work as this can reduce both, the computational expense during training, as well as any unwanted bias from parameters that have very small significance to the model. Consequently, two filter strategies were used for input variable selection, a p-value test and the Pearson correlation coefficient.
The results showed that the p-value test is capable of highlighting parameters of importance to the model; however, the results are more suitable for highlighting the existence of correlation (i.e. the selected parameters cannot be ranked in order of importance). On the other hand, the Pearson correlation test was able to highlight which parameters were important but at the same time use the correlation coefficient as a metric for variable ranking. Interestingly, these two filter methods resulted in different results in terms of input variable selection with the p-value test highlighting 32 parameters, whereas the Pearson correlation test highlighted 14 parameters as important. Two independent models were created using each filter method, and the results showed very good agreement to the experimental data for both cases. However, the model created using the Pearson correlation test showed significant improvement in the low-NO x predictions. This was attributed to a reduction in model redundancies. In this study, we have not proposed the objective standard to select the optimum number of input variables in the NO x ANN model. Nevertheless, the filter strategy presented here based on p-value test and Pearson correlation does provide a quantitative way of selecting the ANN input variables.
Finally, this work has shown the successful implementation of ANN on the prediction of NO x emissions on diesel engines using a completely independent verification dataset. The verification dataset comprised a complete five-point EGR sweep over two speed/load conditions which were completely outside the training envelope of the ANN model. The results show very good agreement to the target NO x values with the error being as low as 3% in the high-NO x region and 6% in the low-NO x region. This indicates that ANN has the potential to predict NO x emissions well outside its training range and to a significantly high accuracy thus providing a useful tool in guiding further experimental and numerical studies on NO x emissions.