Predicting the availability of continuous mining systems using LSTM neural network

This work deals with a model development to predict the availability of continuous systems at the open pits using the artificial neural networks. The main idea of this work is to improve the analytical approach with initial assumption that the time length distributions of a faulty system have an exponential distribution. Data related to the I ECC(excavator, conveyors, crushing plant) system of the Open Pit Drmno Kostolac are used for this work. The aim of this work is to improve a model for predicting the availability of continuous systems at the open pits. On the basis of RMSE , MAE , and R 2 values, presented in this work, it is concluded that the model, obtained by the use of neural network, has a higher predictive power compared to the analytical approach. A corresponding simulation is created on the basis of obtained model that should a scope of the system availability for each type of failure. Also, a more precise image of the availability of continuous systems at the open pits is given on the basis of simulation.


Introduction
The aim of this work is to improve a model for predicting the availability of continuous systems at the open pits. Based on a new model, an image of the system state is defined having a role in planning and control of exploitation, adoption an appropriate maintenance strategy, all with the aim of stable production and cost reduction.
Coal is the basic energy fuel in electricity production. Continuous systems are used at the open pits of the Electric Power Industry of Serbia for coal mining. These are the high-capacity complex excavation systems whose operation is crucial for a reliable supply of coal to the Thermal Power Plant. Related to this, the availability of one system was analyzed on the example of the Open Pit Drmno. Coal exploitation in the Kostolac Basin began in 1870. The Open Pit Drmno is the only active mine in the Kostolac Basin with production of 25% of coal (lignite) in Serbia, see Bugaric´et al. 1 In the previous period, the growth of coal capacity was designed from the current 9 3 10 6 to 12 3 10 6 t/ year and overburden from 40 3 10 6 to the maximum 55 3 10 6 m 3 /year at the Open Pit Drmno. Coal excavation is carried out by two ECC systems with one export conveyor, with occasional engagement of dragline excavator as necessary equipment. The excavated coal from both systems is transported by a collective conveyor to 1 Mining and Metallurgy Institute Bor, Bor, Serbia a distribution bin and further to the Crushing Plant, stockpile, and Thermal Power Plant. The ECC systems are systems that consist of the following elements: bucket wheel excavator, series of conveyors, and crushing plant. If there is a failure of one element of the ECC system, the whole system stops working, see Bugaricé t al. 1

Literature review
In recent years, the published works on the use of machine learning in the field of mining have indicated that the artificial neural networks, as a method of machine learning, have an increasing application in the field of mining. Most of the works are related to the blasting process, see references.  In addition to the works, related to the mining process, there are also works in which models are developed for detecting the geomechanical anomalies, [39][40][41] analysis of available resources, [42][43][44][45] assessment the impact of mining works on the environment. 46,47 The machine learning method is also used for risk assessment of landslides at the open pit, 48 visual detection of objects at the open pit that can classify workers and mining machinery, 49 or prediction the health risks of drivers caused by vibrations during truck transport. 50

ECC system
The most general definition of a system describes it as a functional unit of several interconnected elements. Within the coal mining system, continuous systems represent the systems with the greatest complexity. The basic function of continuous systems is excavation, transport, and disposal of coal, which can be simply described as the coal production. These systems with continuous operation provide a continuous, uninterrupted flow of material from the place of excavation to the place of disposal, which conditions a high functional connection of its elements. The main objective of continuous systems in coal production is the realization of stable and reliable production of suitable capacity. These systems are connected in a series connection as it can be seen in Figure 1.

I ECC system
This work presents a Case Study for determining the availability of a continuous system on coal from the Open Pit Drmno, which consists of the following elements (subsystems): SRs 400 bucket wheel excavator, BRs 2400 beltwagon, a series of belt conveyors, and Crushing Plant. The layout of the ECC system is shown in Figure 2.

Subsystem characteristics
SRs400 bucket wheel excavator. Bucket-wheel excavator represents a self-propelled continuous action machine intended for excavation of overburden and ore at the open pits. Material excavation is done with buckets that are evenly distributed and attached to the rotor rim.
Simultaneously with the rotation of rotor in a vertical plane and rotor boom rotation together with the platform in a horizontal plane, each bucket digs out a section from massive, which is determined by the shape and geometric parameters. By the rotor rotation and coming out of full buckets in the unloading sector zone, the material is emptied from buckets, handed over to the receiving belt conveyor on the rotor boom and further in order, depending on the number of conveyors on the excavator, the last unloading conveyor. [51][52][53][54][55][56][57][58] Bucket-wheel excavator are considered to be one of the most complex machines and are characterized by continuous development and modernization during their lifetime. 59  The SRs 400.14/1.5 bucket-wheel excavator operates within the I ECC system. The manufacturer of this bucket-wheel excavator is manufactured by the German company Takraf. The bucket-wheel excavator was purchased in 1985. 60 The bucket-wheel excavator is in itself a very complex machine system. Like any system, it is composed of a number of subsystems: 2 Subsystem for excavation, 2 Subsystem for excavator movement, 2 Subsystem of receiving conveyor, 2 Subsystem of conveyor stacker, 2 Subsystem for swiveling the upper structure. 1,61 According to the German classification, the bucketwheel excavators are divided into the following classes: A (compact excavator), B (excavator with C frame), and C (giant excavator) according to the basic construction characteristics ( Figure 3). 51,52,57,58,62 This excavator belongs to the group of compact rotary excavators. Compact excavators have a relatively short boom in relation to the diameter of working wheel. 1,62 The rotary excavator operates in very difficult conditions, where high productivity, reliability, availability, and safety at work are constantly expected from it as a carrier of production. The operation effects of mining machines depend on the reliability, their functioning, technical and technological performances, handling, maintenance, logistic support, adaptability -compliance of the relationship between the performances of machines and characteristics of the working environment. 1,63 Figure 4 shows the SRs 400.14/1 bucket-wheel excavator. Table 1 presents the structural and technical characteristics of the SRs 400.14/1 bucket-wheel excavator.
BRs 2400 beltwagon. Beltwagon represents connection between the excavation and transport equipment within the continuous system. Its mobility enables an increase of technological parameters of the excavator operation according to the plan and height, more efficient use of the bucket-wheel excavator within the bench system of the open pit and better time utilization. According to the construction, they can be: rigid or with rotating booms. The capacity should be aligned with the excavation equipment capacity. Figure 5 presents the BRs 2400 beltwagon. Table 2 gives structural and technical characteristics of the BRs 2400 beltwagon.
Belt conveyors. Continuous transport with conveyors is increasingly used at the open pits of medium and large capacities. 65 Transport of overburden and coal is one of the most important parts of the technological process of lignite   exploitation. Transport costs account for 40%-60% of the total operating costs. 66 Figure 6 shows the basic parts of the conveyor. The basic parts of a belt conveyor are: 2 endless rubber belt that represents the carrying and haulage body, 2 supporting structure (belt) of a conveyor that carries the upper and lower sets of pulleys, 2 drive station, 2 return or end station, 2 tightening device, 2 cleaning device for belts and drums, 2 loading or unloading part, 2 apparatus for control and automatic control. 52 Table 3 gives the design and technical characteristics of a belt conveyor on the I ECC system.

Description of the data set
There is no machine (continuous system) that operates without failure. Failures on continuous systems have negative production and economic effects. A failure or breakdown is a cessation of element ability to perform its function. There is a complete (machine shutdown) and partial failure (machine works but with deteriorated characteristics), see Ivkovic´. 68 On the basis of data, obtained from the Electric Power Industry of Serbia, which also includes the Open Pit Drmno, a time-related database was formed for mechanical (damage of the upper structure bearings, cracking of crawlers, tooth replacement, etc.), electrical (cable breakdown, interruption of TT connection, blockade breaking, etc.), and other failures (overhaul, service, conditional standstill due to the bad weather conditions, etc.) of the I ECC system (SRs 400) for a period of 4 years (2016, 2017, 2018, and 2019), see Bugaric´et al. 1 Program language in the Python 3.7.7 in the PyCharm editor was used for data processing, as well as for further analysis and availability prediction of continuous systems. Table 4 shows a part of database. The database contains data related to the date, facility on which the failure (delay) occurred, exact time and date of delay beginning and end, as well as the total time in delay.
The basic idea is that the application of neural networks can improve the analytical approach of determining the availability of continuous systems at the open pits, which uses the assumption that failure rates have an exponential distribution. In order to properly demonstrate the advantage of neural networks, it is necessary that both the calibration set (in the case of neural network, the calibration set is further divided into training and validation set) and the test data set for both models are matched. The same test statistics RMSE (Root Mean Square Error) and MAE (Mean Absolute Error) will be used in both cases. These statistics are defined by: where y a 1 , y a 2 ,., y a n are the actual values, and y p 1 , y p 2 , ., y p n are the values predicted by the model. As the additional statistics, the determination coefficient R 2 is used, that is defined by: The initial database is divided into three groups based on the type of failure (mechanical, electrical, and other failures), and then each part is divided into a calibration set and test data set. The calibration data set contains data for the appropriate type of failure on which the model is developed, and covers a period of 3 years (2016, 2017, and 2018). The test data set contains data on which the power of obtained models for the appropriate types of failure is assessed, and includes the last year of data in the database (2019). In the case of machine failures, the percentage of data, on which the model is trained, is 77.51%. In the case of electrical failures, it is 75.24%, while the percentage for other failures is 79.34%. Table 5 presents the total number of failures by type and years.
The following Table 6 presents the mean value, standard deviation, minimum value, first quantile value, median, third quantile value, and maximum value before and after outlayer treatment on the calibration  data set for each type of failure. The method used in the treatment of outliers is the Z-Score method.
Extracting a large number of feature data and determining the distribution of failure intensity do not give a clear image of the statistical set, because a large number of classes is obtained, so dividing the statistical data set into equal intervals is carried out. The total interval of the statistical set I = t max À t min , where t min and t max are the minimum and maximum failure lengths, is divided into 5log 10 (n) equal parts, where n is the total data number for a particular type of failure, see Djuric´. 69

Methodology
Availability is calculated on the basis of a time state picture, in which the times when the system is in good condition, the ''up-time,'' change with the times when the system is out of order, the ''down-time.'' The time state picture can be shown in Figure 9. The time when the system is in good condition can be divided into inactive time, that is the time while the system is waiting for work (stand-by) (t 11 ) and the time when the system is working (t 12 ). Time, when the system is in failure, is divided into: organizational time (t 21 ), logistic time (t 22 ), and active repair time (t 23 ) which can be the time for corrective repairs (t 231 ) and time for preventive repairs (t 232 ), see Djenadic et al. 70 Availability is defined as the quotient of the total time during which the system is in good condition and the total time that makes up the time in good condition and time in failure (operational availability), see Djenadic et al. 70 One of the most common approaches in determining the availability of continuous systems is based on the assumption that the lengths of time the machine has failed to have an exponential distribution (see Djenadic  et al. 70 ). Although the values of RMSE, MAE i R 2 statistics for the stated analytical approach on the available data are large, see Table 6, the assessment of distribution the lengths of time the machine has failed, is performed by a model obtained using the neural networks. The obtained model better describes the actual state of data. By applying the obtained neural network, the failure lengths are further generated, on the basis of which the availability of a continuous system in a period of 1 year is calculated. Figure 10 schematically presents the flowchart of methodology.
The research process of this study is as follows:  Step 1 -Loading and processing of raw data Using the appropriate Python libraries (pandas, numpy) to load and organize the data, all the necessary processes were performed to prepare the data for further analysis and create an appropriate model for predicting system availability.
Step 2 -Preparation of data for the model The input variables which effect on the availability of continuous mining system, in this study, are interval limits and year. They are ranked according to the experiment sequence, and they are splitted into the calibration and test values.
Step 3 -Deep learning of data failure intensities The number of the input layers, hidden layers, and output layers are defined in the neural network framework. Also, in this step are chosen activation functions and optimizer.
Step 4 -Choice of model architecture and hyperparameters The batch size and learning rate are set to neural network framework. The training process goes on repeatedly until the neural network convergence.
Step 5 -Model evaluation on a test set When neural network model has been trained, the output arguments are obtained as the predictions. The RMSE, MAE, and R 2 values are used to compare the predicted and the actual values.
Step 6 -Simulation In this step are defined simulation which described the availability of continuous mining system in the next year.

Results and discussions
Analytical approach and exponential distribution The analytical approach (AP) of determining the availability of continuous systems at the open pits uses an assumption that the failure intensities take an exponential distribution EXP (l) with a parameter l, is also called a failure intensity. The density function pdf(x) and distribution function cdf(x) are determined by: and cdf x ð Þ = 1 À e Àlx , x ø 0 0, x\0 ð6Þ Using the property that the expectation of exponential distribution EXP (l) is equal to 1/l, the corresponding values of parameter l for each type of failure are obtained. In the following Table 7, in addition to the values of the obtained parameter l, the values RMSE, MAE, and R 2 are also stated, both on the calibration set and test data set for each type of failure. Large deviations of the values predicted by the model and failure frequencies are observed in the shown graphics ( Figure 11). Thus, the values of failure in the first interval are underestimated, and those in the second and third interval are overestimated.

Neural network approach
Neural network. Neural networks are the most popular method of machine learning, see Das and Behera. 71 Artificial neural networks (ANNs) represent a branch of artificial intelligence, see Monjezi et al. 31 The first artificial neural network is given by McCulloch and Pitts (1943) and since then has been popular and applicable in various fields of science and technology to solve the complex problems, see Sayadi et al. 5 The artificial neural networks (computational) models are inspired by the structure and functioning of biological nervous systems. Trying to mimic the functioning of biological nervous systems makes them the adaptive systems that learn from examples, find dependencies between the data that do not seem to exist, find the new ways of data processing, or change their mode of operation to reach an optimum solution unlike the classical models that rely on linear programming, see Stojadinovic´. 72 The basic variant consists of fully connected neural networks that consist of the basic computational units, called neurons. The neurons are organized into layers so that the neurons of one layer receive as their input arguments the output arguments of all neurons of the previous layer, and forward their output arguments to all neurons of the next layer. All layers, whose neurons pass their output arguments to the other neurons, are called the hidden layers. The input arguments of the first layer are the input arguments of network, and the output arguments of the last layer are the output arguments of network. The output argument of each neuron is the value of activation function (in practice, ReLu, sigmoid, linear, exponential. are the most often used) over the linear combination of the input arguments of the observed neuron. Formally, the model is defined as follows: where x is the input argument of network, L is the number of layers, W i is the matrix of non-free coefficients, and w io is the vector of free coefficients of the linear combination, while g is the activation function, see Nikolic´and Zecˇevic´. 73 Using the error function that compares the actual and predicted values, obtained from the neural network, the linear combination coefficient of input arguments for each neuron is changed. Changing the coefficients of linear combination of the input arguments for each neuron is done by the backpropagation algorithm that consists of three steps: calculating the output argument of network, calculating the error function that compares the actual and predicted values, and correcting the linear combination coefficients for each neuron.
In practice, the RMSE and MAE are the most commonly used for error functions.

LSTM networks
There are two basic problems of recurrent neural networks: the first concerns the problem of emerging and exploding gradients, while the second problem concerns the long-term storage of information and modeling the long-term dependencies in data. Both of these problems are overcome by the use of long short-term memory (LSTM), which is a complex network unit with a specific structure that allows control of reading and writing to the unit. The basic idea of LSTM is the existence of a so-called cell that keeps a hidden state, with control of writing, reading, and forgetting, which is done on the basis of learned rules. A specific formulation is given below 73,74 : where c denotes the cell that stores the LSTM unit state, z the transformed value of input, i the value of input gate, f the value of forget gate, q the value of output gate, and h the value of LSTM unit output. Each of the gates has its own set of parameters, marked with the appropriate index. The architecture of LSTM unit according to Goodfelow et al. 74 is given in Figure 12. Figure 13 shows the structure of the LSTM unit compared to the Standard Recurrent Unit (SRN). The essence is quite simple. Each of the gates has the same structure as a unit of standard recurrent network and, based on the received input and associated  73 For example, the input gate controls whether it will miss the input to cell, and it operates based on the input (first sum in the definition of input activation), and its state in the previous step (second sum) calculates the coefficient used to control the effect of input on the cell state. Similarly, the forget gate controls the effect of previous state of cell on the new state of cell in an analogous way. There are various modifications of the LSTM, but it is limited here to its form. The LSTM is important for two reasons. First, thanks to the gates that can control the input to the cell, the cell does not have to accept the input signals and, therefore can store information about distant parts of the sequence for a long time. Whether it should receive the input signals or not is something that is learned thanks to the fact that the input gate is parameterized. It should be kept in mind that in practice a large number of such units are used in parallel, so that while the others can process the current input, combine it with information from the former and similar. 73 Another reason of the LSTM significance is to mitigate the problem of emerging gradients. Namely, in the case of an ordinary recurrent network, the new value of hidden layer is obtained taking the output of that layer in the previous step as the input, transformed by the activation function. Addition of activation functions to the calculation threshold leads to multiplying the derivatives of activation functions in calculating the derivatives. By multiplying such numbers, the gradient disappears. On the other hand, when calculating a new cell value, there is no application of the activation function. Certainly, the previous value is multiplied by the forget gate value including the sigmoid activation function, so its derivative must appear somewhere in the gradient calculation, but the existence of paths in the computational graphics that are not affected by this problem, has a noticeable effect. For example, a standard recurrent network usually does not model well the dependence on a distance greater than a dozen steps. On the other hand, the recurrent networks with the LSTM unit successfully model the dependencies even at a distance of several hundred steps. 73 Neural network for determining the probability of failure in a certain time interval. The neural network, presented in this work, predicts the probability that the failure length is in a predetermined interval for a given year. Consequently, the input arguments of network are the interval and year limits, and the output argument is the probability that an arbitrary failure for a given year is within the interval limits.
Using a formula to determine a length of the optimal intervals into which the statistical set is divided, for each type of failure and year that is in the calibration data set, the interval limits and corresponding failure frequencies for a particular type of failure and year in which the failure occurred are defined. Additionally, the failure frequencies are determined for the interval limits, resulted from translation of the originally obtained intervals for k, 2k, 3k. minutes.
The optimal values for parameter k were obtained such as the value of RMSE statistics on a validation data set, which represents 20% of the randomly selected data from a part predicted for calibration, is minimized. In the case of all types of failures, the parameter k value is equal to 5. Furthermore, for each type of failure, the interval limits and corresponding failure frequencies in the test data set are defined in a similar way.
Furthermore, a neural network is defined for each type of failure. The neural network consists of two hidden layers ( Figure 14). The type of neural network is Bidirectional long-short term memory (Bidirectional LSTM) implemented in the keras python library. The first hidden layer consists 128 neurons with ReLu activation function. The second hidden layer consists 256 neurons with sigmoid activation function. The activation function of output layer is the exponential function. The optimizer used in these modes is Adam optimizer with learning rate 0.0001. The MSE (defined in (1)) is an error function used in this work. Model development for each neural network was completed on 80% of data randomly selected from the calibration part (training set), while the remaining 20% was used for testing (validation set). The number of epochs for each model was 2000, while the value of batch size differed from the type of failures. For mechanical failures we used batch size 1000, the value of batch size for electrical failures was 300, while for other failures the value of mentioned parameter was 2000. The obtained models were tested on an independent data set (test set), which included the limits of the interval and the values of the corresponding probabilities for 2019. The input arguments of each neural network are the limits of interval and years, and the output argument of neural network is the probability of failure whose length is within a given interval for a given year. The activation functions used in this work are: ReLu, Sigmoid, and Exponential. Definitions and graphics of the mentioned functions are listed in Table 8. For example, the input arguments and output arguments, for electrical failure, are presented in the Table 9. The Figure 15 shows the values of error functions on each type of models. Table 10 presents the RMSE, MAE, and R 2 values on the calibration data set (training set and validation set) and test data set for each neural network.
Comparative analysis of the analytical approach and model obtained using the neural network Based on the RMSE, MAE, and R 2 values presented in Tables 7 and 10 and comparable Table 11, it is concluded that a model obtained using the neural network has a greater predictive power.

Availability prediction
For the high-capacity mining systems such as the continuous coal excavation system (I ECC system), it is important to anticipate its availability to define the system condition picture necessary in the planning phase. Table 8. Definitions and graphics of the used functions.

Name Function Plot
ReLu g x ð Þ = x, x ø 0, 0, x\0: Sigmoid Time when the system is not in operation entails the production and economic costs. This model has the role of assisting the responsible persons at the open pit in the planning and control of exploitation, adoption an appropriate maintenance strategy, all with the aim of stable coal production and cost reduction. The availability of a specific system as a whole is the basic input for production planning at the lignite open pits of Electric Power Industry of Serbia, but also other activities in the field of planning, monitoring production, or maintenance of equipment. Figure 16 shows the failrue frequence curves determined by the neural network and by monitoring. On the basis of model, obtained from the neural network for initially defined intervals of the statistical data set, for each type of failure, the appropriate probabilities can be assigned with a failure length within the interval limits.  Simulation. The n test of random numbers, where n test is the failure number in the test data set, is determined for each interval of initially divided statistical set for a certain type of failure (mechanical 369, electrical 299, and others 528). If the random number is greater than the probability, obtained from the model that uses the neural network, it is considered that the failure did not occur. If the random number is less than the probability, it is considered that the failure occurred. In addition, it will be considered that a length of such generated failure is equal to the middle of observed interval. In this way, the total failure time of a continuous system with failure lengths in the observed interval should be obtained. The total failure time of continuous system on coal for a particular type of failure is obtained by addition the total failure time within each interval.
Observing all combinations of mechanical, electrical, and other failures, 1,000,000 different times that the system spent in failure are obtained. Based on the obtained results, 1,000,000 values for the system availability are obtained. Table 12 presents the basic statistics for simulated values of each type of failure, while Table 13 presents the basic statistics of simulated values for availability.

Conclusion
Based on the RMSE, MAE, and R 2 values, presented in Tables 7 and 10 and comparable Table 11, it is concluded that the model obtained using the neural network has a higher predictive power related to the probability that the fault length is in a certain time interval as shown in Figure 17. Based on the