Nonlinear time series prediction algorithm based on AD-SSNET for artificial intelligence–powered Internet of Things

Time series have broad usage in the wireless Internet of Things. This article proposes a nonlinear time series prediction algorithm based on the Small-World Scale-Free Network after the AIC-Optimized Subtractive Clustering Algorithm (AIC-DSCA-SSNET, AD-SSNET) to predict the nonlinear and unstable time series, which improves the prediction accuracy. The AD-SSNET is introduced as a reservoir based on the echo state network to improve the predictive capability of nonlinear time series, and combined with artificial intelligence method to construct the prediction model training samples. First, the optimal clustering scheme of randomly distributed neurons in the network is adaptively obtained by the AIC-DSCA, then the AD-SSNET is constructed according to the intra-cluster priority connection algorithm. Finally, the reservoir synaptic matrix is calculated according to the synaptic information. Experimental results show that the proposed nonlinear time series prediction algorithm extends the feasible range of spectral radii of the reservoir, improves the prediction accuracy of nonlinear time series, and has great significance to time series analysis in the era of wireless Internet of Things.


Introduction
Numerous data spark in many known or unknown fields with the advent of 5G and the Internet of Things era; these data submerge the world that human beings depend. The analysis of these data has brought great challenges to scholars and has become a hot spot of artificial intelligence research. [1][2][3] Time series data refer to a consistent stream of datasets over the course of a period of time. The time series analysis by machine learning method mainly includes clustering, classification, anomaly detection, and prediction, which will bring significant benefits to all kinds of people in various vertical fields. 4,5 Therefore, this article researches the time series prediction method to provide more possibilities for analyzing a large number of time series in the wireless Internet of Things.
Time series prediction is widely used in the fields of industry, 6 economy, 7 environment, 8 and so on. However, most of the time series show strong nonlinear characteristics in the real world. Therefore, it is necessary to construct a prediction model by using a nonlinear prediction method to improve the model's fitting ability to the nonlinear time series data. At present, nonlinear prediction methods are mainly divided into two categories. One is the regression method, [9][10][11][12][13] which is suitable for time series prediction with slower change, and the nonlinear characteristics of the time series are easily eliminated by its linearization process. The other is to predict by the neural network in machine learning, [14][15][16] especially represented by echo state network (ESN). 17,18 It has a large, sparsely connected reservoir, and its learning method is efficient. 19,20 The approximation capability of nonlinear time series is mainly ensured through its reservoir. However, the random connection of internal neurons in the reservoir of traditional ESN leads to the randomness of the network structure, making the model training purposeless and poorly adaptable, and unable to meet the requirements of effective prediction to nonlinear time series. 21,22 Therefore, it is necessary to analyze and improve the network structure of the reservoir.
To improve the performance of the reservoir, some scholars proposed the small-world networks to replace the random network. [23][24][25][26] The small-world network is a kind of network structure that can reflect the real world. It has both a short average characteristic path length (ACPL) and a high average clustering coefficient (ACC) and has the advantage of random network and regular network. The literature [27][28][29][30] proposed a smallworld echo state network (SWESN). It used a smallworld network to improve the structure of the reservoir and improve prediction accuracy and adaptability. [31][32][33] Kawai et al. 34 studied the performance of the reservoir under three different topologies: regular network, small-world network, and random network, which proved the superiority of the small-world network. However, the real-world social network nodes are randomly distributed at birth and gradually form a life circle. Therefore, some scholars applied this idea to improve ESN through clustering methods. The internal neurons of the reservoir are clustered to construct the synaptic matrix according to the clustering information. Furthermore, the predictive model is constructed to improve the nonlinear prediction capability. 35 Deng and Zhang 36 proposed a scale-free highly clustered echo state network (SHESN), whose reservoir is uniformly clustered with both small-world and scale-free characteristics. It was successfully applied to Mackey-Glass (MG) and laser time series prediction and obtained higher prediction accuracy than ESN. The Xue te al. 37 applied SHESN in the financial time series prediction and achieved better prediction performance. It proved that the high clustered scale-free network has strong computing power. Lei et al. 38 proposed a complex ESN based on prior clusters. The results of power spectrum analysis are used as prior knowledge to construct subclusters for the prediction problem of traffic flow time series with multi-period characteristics. Najibi and Rostami 39 used the k-means algorithm to optimize the clustering effect of the reservoir in SHESN. However, the number of cluster heads in the reservoir must be pre-set according to prior knowledge.
This article is absorbed in the problem that the above method relies too much on prior knowledge when clustering. In order to improve the clustering performance, the nonlinear time series prediction algorithm based on the Small-World Scale-Free Network after the AIC-Optimized Subtractive Clustering Algorithm (AIC-DSCA-SSNET, AD-SSNET) is proposed. It can adaptively obtain optimal clustering scheme and construct a complex clustering network with small-world scale-free characteristics. The clustering network is used as a reservoir to improve the prediction accuracy of nonlinear time series. Moreover, it has great significance to time series analysis in the era of wireless Internet of Things.

AD-SSESN architecture
On the basis of ESN architecture, the AD-SSESN model is constructed by using the AD-SSNET as the reservoir to improve the nonlinear approximation capability. The AD-SSESN architecture is shown in Figure 1.
The AD-SSESN architecture has three layers, and the reservoir is AD-SSNET. Its state updated equation and output equation are as follows: where u(t) denotes the input vector, x(t) denotes the state vector of reservoir, y(t) denotes the output vector, f ( Á ) = tanh ( Á ) denotes the activation function, W in denotes the input synaptic matrix, W res denotes the reservoir synaptic matrix, and W out denotes the output synaptic matrix. Here, W in is randomly generated before network training, and W res is generated by constructing the AD-SSNET, neither of them changes during the training process. W out is calculated by the least-square method. 40

AD-SSNET generation method
How to build a reservoir synaptic matrix W res that has better performance is a key point in generating the AD-SSNET model. This article proposes a generation algorithm of reservoir synaptic matrix, and its flow diagram is shown in Figure 2. First, the AIC-DSCA optimal clustering algorithm is studied to cluster randomly distributed neurons. Then, the small-world scale-free network is built by the clustering result. Finally, the synaptic information between two neurons is extracted, and the reservoir synaptic matrix W res is calculated according to it.
Determination of optimal clustering scheme by AIC-DSCA. The AIC-DSCA is used to obtain the optimal clustering scheme adaptively for randomly distributed neurons. First, the dynamic subtractive clustering algorithm (DSCA) is studied. The maximum intra-cluster distance variance as the evaluation index is proposed to find the optimal clustering scheme under different cluster head numbers. Second, the Akaike information criterion (AIC) of DSCA is introduced to determine the optimal clustering scheme by calculating the optimal cluster heads. The specific steps are as follows:

DSCA
The parameter combinations G = ½r a , u of traditional SCA is set to rely heavily on expert experience, and different parameter combinations G = ½r a , u will lead to different clustering effects. Therefore, we proposed the DSCA. The DSCA is as follows.
First, the cluster head numbers k = 1, 2, 3, 4, 5, ::: are set and the SCA to obtain candidate clustering schemes is performed. Let the coordinates of N randomly distributed neurons be M 1 = (x 1 , y 1 ), M 2 = (x 2 , y 2 ), :::, M N = (x N , y N ), and the density value of each neuron in the network is calculated. The greater the density value of one neuron, the more the number of neurons included by it, and the higher the probability of becoming a cluster head. When calculating the first cluster head position, the density function f 1 i of the neuron i is expressed as follows: where r a is a constant, which defines a neighborhood where the density value is significantly reduced and other neurons outside the neighborhood have little effect on the density value of neuron i. The maximum density value f 1 opt = max f 1 i È É is selected and the corresponding neuron M 1 opt = (x i , y i ) is the first cluster head. In order to exclude the influence of the zth cluster heads on the density of other neurons, the neuron density function needs to be modified in the selection of the other (z + 1)th cluster heads after the first cluster head is determined. The density function f z + 1 i of the modified neuron i is expressed as follows: where r b is a constant and r b = 1:5r a . The updated maximum density value f z + 1 is selected and the corresponding neuron M z + 1 opt = (x i , y i ) is the (z + 1)th cluster head. When f k opt =f 1 opt ł u, u 2 (0, 1), the onetime SCA ends and g cluster heads are formed in the network.
If g and k are consistent, the parameter configuration G = ½r a , u is used as a candidate clustering scheme for k cluster heads and is traversed to repeat the abovementioned SCA until all different candidate clustering schemes of k cluster heads are selected. Finally, it is necessary to choose the optimal clustering scheme with an evaluation index if there are more than one candidate scheme under the same cluster head numbers. For each candidate scheme, each neuron is assigned to the nearest cluster head by the nearest distance principle. According to the distance from each neuron of the ith cluster to the cluster head under k cluster heads, the distance variance D j (j = 1, 2, :::, k) of the ith cluster is calculated as: where d i denotes the distance from the ith member in the jth cluster to the cluster head, d denotes the average distance from all members of the jth cluster to cluster head, and Q denotes the number of all neurons in the jth cluster. The maximum distance variance max D j (j = 1, È 2, :::, k)g is calculated as the evaluation index. If there are multiple candidate clustering schemes, the optimal clustering scheme is selected under k cluster heads according to the minimum evaluation index.

The AIC criterion of DSCA
It is necessary to select the optimal cluster heads after obtaining the optimal clustering scheme under different cluster heads. The AIC is used as an evaluation index proposed by H. Akaike in the study of time series ordering problems. Its distinctive feature is the ''principle of parsimony,'' and its definition is as follows: where l is the maximum likelihood estimation function of the model, and r is the number of independent parameters of the model. In general, the AIC value decreases when r increases, and the log-likelihood function ln (l) increases faster. In addition, the AIC value increases, and the model to be over-fitting when r is too large, and the growth rate of ln (l) is slow. Therefore, the model is best when the AIC value is the smallest. The AIC criterion of DSCA is as follows. Setting the number of neurons to N and the number of cluster heads to k, then the distribution of cluster heads is M = ½M 1 opt , M 2 opt , :::, M k opt , the number of the neurons in each cluster is Q i (i = 1, 2, :::k), the maximum distance variance is v max = max D i jji = 1, 2, :::k f g , and the minimum distance variance is v min = min D i jji = 1, 2, :::k f g in all clusters. Then, the distribution density function of intra-cluster distance variance is as follows: Therefore, according to the log maximum likelihood estimation function, the intra-cluster distance variance likelihood estimation function l is as follows: According to equation (9), the cluster head numbers with the smallest AIC are the optimal cluster heads, and the optimal clustering scheme under the optimal cluster number is used as the final optimal clustering scheme. The flow diagram of the AIC-DSCA is shown in Figure 3 and the specific steps are as follows: Step 1: Set the parameter configuration G = ½r a , u and k = 1.
Step 2: Perform DSCA on N neurons to calculate the cluster head numbers.
Step 3: If cluster head numbers is k, proceed to Step 4. Otherwise, modify the parameter configuration G = ½r a , u and proceed to Step 2.
Step 4: After obtaining all candidate clustering schemes under k cluster heads, the optimal clustering scheme could be selected according to the intra-cluster maximum distance variance. Step 5: Calculate the AIC value of the optimal clustering scheme under k cluster heads.
Step 6: If the AIC value is the minimum value, the optimal cluster head number is current k, and the final optimal scheme is determined. Otherwise k = k + 1 and proceed to Step 2.
Construction of SSNET. According to the clustering result of neurons, the small-world scale-free network is constructed by intra-cluster connections and inter-cluster connections. Inter-cluster connections will be fully connected for all cluster heads, and the way of intra-cluster connections is as follows.
First, the neurons are defined as two types. One is the cluster head neurons as backbone neurons; the other is the neurons close to their backbone neuron as local neurons. The candidate neighbors of a new local neuron are the set of neurons to which this new local neuron is allowed to be connected. Assuming that there is a circle whose center is the location of backbone neurons in the current cluster and radius is the Euclidean distance from the new local neuron to the location of its backbone neuron. Other existing local neurons in the circle are defined as candidate neighbors of newly added local neurons. Of course, the backbone neuron of the current cluster is always one of the candidate neighbors.
Then, local neurons within the cluster are chosen according to the distance from the backbone neurons and the connections are established with the existing candidate neighbor neurons.
N max denotes the maximum number of connections of a new local neuron and controls the density of intercluster connections. N c denotes the number of candidate neighbors of a new local neuron. Therefore, the connection probability of the new local neuron based on the intra-cluster priority connection algorithm is given by the following rules: (a) if N max ø N c , the new local neuron is fully connected to all the candidate neighbor neurons; (b) if N max \N c , the new local neuron is connected to all the candidate neighbor neurons with the following probability: The number of connections of a neuron is called degree. Here, s i is the degree of current neuron i, and C is the candidate neighbor neurons of the new local neuron. Neurons prefer to connect to neurons that already have more connections according to the scale-free criterion. Therefore, the probability that a new local neuron is connected to an existing neuron is proportional to the degree of the existing neurons.
The flow diagram of the SSNET construction is shown in Figure 4 and its specific steps are as follows: Step 1: Choose a cluster. All neurons are divided into backbone neurons and local neurons.
Step 2: Choose a new local neuron according to the distance from the backbone neurons, and calculate the number of candidate neighbor neurons.
Step 3: Set the maximum number of connections, and let the new local neuron connect to its candidate neighbor neurons according to the intracluster priority connection algorithm. If all the new local neurons in the same cluster have been added, proceed to Step 4, otherwise proceed to Step2.
Step 4: If all the clusters have completed the intra-cluster connections, proceed to Step 5, otherwise, proceed to Step 1.
Step 5: Let all cluster heads make a full connection, and build the small-world scale-free network.
Therefore, the AD-SSESN prediction model construction and training process of the prediction algorithm in this article are as follows: Step 1: Initialize the model parameters.
Step 2: Obtain the AD-SSESN prediction model by constructing the AD-SSNET as a reservoir.
Step 3: Obtain the internal state matrix X and the corresponding expected output matrix Y by calculating and collecting the internal state vectors x(t) and output vectors y(t) of the reservoir by using the training datasets.

Algorithm 1: AIC-DSCA
Input: the parameter configuration G = ½r a , u and k = 1 Output: the optimal clustering scheme and the optimal cluster head number k 1: for k = 1; k ł N; k + + do 2: for i = 1; i ł 100; i + + do 3: for j = 1; j ł 100; j + + do Get the cluster head number g by SCA; g G = ½r a (i), u(j) 4: if g = = k then determine the optimal scheme under the optimal cluster head number k Calculate AIC value 5: end if 6: end for 7: end for 8: if AIC = min(AIC) then determine the optimal scheme and the optimal cluster head number k Goto Wait for FINISH 9: end if 10: end for Step 4: Calculate the output weight W out by the least-square method, and then the trained AD-SSESN prediction model is obtained to predict nonlinear time series.

Analysis of cluster
In total, 1000 neurons were randomly distributed on the plane of 300 3 300. The AIC-DSCA is used for clustering to obtain the optimal clustering scheme. First, the DSCA is used to select the optimal scheme under different cluster head numbers, and the results are shown in Table 1. Table 1 lists the clustering information under the number of cluster heads from 7 to 12. It can be seen that the clustering scheme is diverse under the same cluster head number. Therefore, the optimal clustering scheme under each cluster head number is selected by an indicator (the smallest ''intra-cluster maximum distance variance'').
After obtaining the optimal clustering scheme under each cluster head number, the final optimal cluster head number and its optimal clustering scheme could be selected by the AIC of the DSCA. The AIC values of
The number of cluster heads with the minimum AIC value is chosen as the optimal one, so the number of the optimal cluster heads selected by the AIC-DSCA is 10; its clustering result is shown in Figure 5.

Analysis of network characteristics
After the optimal clustering scheme is obtained, the AD network is constructed by the method explained in section ''Construction of SSNET.'' The small-world characteristics and the scale-free characteristics of the AD network are analyzed.
Analysis of small-world characteristics. The small-world characteristics in the complex network can be characterized by its ACPL and ACC. When the ACPL is small and the ACC is large, the small-world characteristics of network are better. 41 The ACPL and ACC of the parent network and each subnet based on the AD network are shown in Table 3. The ACPL and ACC of clustering schemes under different cluster head numbers are shown in Table 4. The ACPL and ACC of randomnetwork, small-world network, and high-clustering scale-free network under the same scale are shown in Table 5.
It can be seen from Table 3 that the ACPL of the parent network and its subnets are small, and the ACC is large, so this indicates all have small-world characteristics. The number of members and the small-world characteristics is similar in each subnet, so this indicates that the structures of the AD network are hierarchical and uniform-clustering in terms of small-world characteristics. It can be seen from Table 4 that the ACPL reaches a minimum when the cluster head numbers are 10, and the ACC reaches a maximum when the number of the cluster heads is 10. Therefore, when the number of the cluster heads is 10, the small-world characteristic of the AD network is the best. It can be seen from Table 5 that the ACPL of the AD network is smaller than the small-world network and the highly clustered     scale-free network, and the ACC is larger than the random network, the small-world network, and the highly clustered scale-free network. Consequently, the smallworld characteristic of the AD network is more significant.
Analysis of scale-free characteristics. The scale-free characteristics in complex networks can be characterized by whether the degree of neurons satisfies the power-law distribution. 42 In the AD network, the degree of each neuron is calculated, and the number of neurons with different degrees is got accounted for; its distribution is shown in Figure 6. It is processed logarithmically and fitted linearly, and then the correlation coefficient R is calculated. It is considered that the power-law distribution is satisfied if R j j ø 0:95. The logarithmic relationship between the number of neurons and the degree of neurons and their fitted lines are shown in Figure 7. By calculation, the correlation coefficient R is 0.986. Therefore, it indicates that the AD network has scalefree characteristics.
In addition, the correlation coefficient R of the parent network and its subnet are calculated separately. It can be seen from Table 6 that the correlation coefficients R all are bigger than 0.95, so it indicates that they all have the scale-free characteristics. The number of each subnet members is similar; it further indicates that the structure of the AD network is hierarchical and uniform-clustering in terms of the scale-free characteristics.

Analysis of prediction
Dataset preparation and testing criterion. The MG time series and Lorenz time series were used as the dataset of nonlinear time series for prediction, which is generated as follows: 1. The chaotic dynamic formula of the MG system is as follows: where t denotes the time delay; the greater the time delay t, the stronger the nonlinearity of the system, which has chaotic characteristics when t ø 17. The time delay is increased from 17 to 31 and the fourth-order Runge-Kutta algorithm is used to solve the MG system, then the 15 nonlinear time series datasets are constructed. The first 2300 points of each dataset are the training set and the last 200 points are the test set.
2. The chaotic dynamic formula of the Lorenz system is as follows: The Lorenz system is solved by the fourth-order Runge-Kutta algorithm, and the time series of 2500 points are calculated, then the first 2300 points of where l denotes the number of the independent repeat tests, and this experiment used 100 independent repeat tests; n t and n c are the length of the training set and the test set, respectively; y l d (m) denotes the true value of the mth iteration in the lth independent experiment; and y l (m) denotes the predicted value of the mth iteration prediction in the lth independent experiment.
Analysis of echo state property. Generally, the prediction model can be trained and predicted normally and stably only when the reservoir has ''the echo state.'' After normalizing the synaptic matrix, the spectral radius of synaptic matrix W res is the largest eigenvalue l max , and it is used to measure the intensity of the ''the echo state'' of the reservoir. For such a randomly connected ESN, its spectral radius must satisfy l max j j\1, so that the reservoir could have ''the echo state.'' In order to analyze ''the echo state'' of AD-SSESN, the prediction effects under different spectral radius l max are obtained by using the MG dataset (t= 17) and the Lorenz dataset to test the AD-SSESN. Compared with ESN, the results are shown in Figures 8 and 9.
It can be seen from Figures 8 and 9 that the spectral radius of the AD-SSESN is significantly larger than the ESN when NRMSE error increases significantly in the MG dataset and Lorenz data. Therefore, the ''echo state'' of the AD-SSESN is significantly enhanced, the stability of the prediction time series of the AD-SSESN is maintained over a wider range of spectral radius, and the predictive power is enhanced.
Approximating nonlinear capability. The MG dataset and Lorenz dataset are preprocessed through normalization and phase space reconstruction, and the prediction results are de-normalized. The prediction results of AD-SSESN for MG dataset and Lorenz dataset are shown in Figures 10 and 11. It can be seen from Figures 10 and 11 that the predicted curves of the AD-SSESN for MG datasets and Lorenz datasets are consistent with the actual curve trend, which indicates that the AD-SSESN has a high fitting ability.
In order to further analyze the error, the small-world scale-free prediction models (X-SSESN) are constructed according to different clustering schemes in Table 4, and 15 MG datasets and Lorenz datasets are predicted respectively. The results are shown in Figure 12 and Table 7.
It can be seen from Figure 12 and Table 7 that the NRMSE errors of AD-SSESN for 15 MG datasets and Lorenz datasets are the minimum. Furthermore, in the prediction of MG datasets, the prediction accuracy can still be maintained with the increase in MG delay time.
Finally, four prediction models of AD-SSESN, ESN, SWESN, and SHESN with the same reservoir size, sparse connectivity, and appropriate spectral radius are constructed respectively according to the different reservoirs in Table 5. And 15 MG datasets and Lorenz datasets are predicted by the above models, and the results are shown in Figure 13 and Table 8.
It can be seen from Figure 13 and Table 8 that AD-SSESN has better prediction results for MG datasets with different time delays and Lorenz datasets    compared with the other models. In the MG datasets, the AD-SSESN can maintain good prediction performance when 17 ł t ł 24. When the time delay t.24, the nonlinear enhancement of the MG dataset, the NRMSE error of ESN, SWESN, and SHESN increases rapidly, while the NRMSE error of AD-SSESN increases relatively slowly. In the Lorenz datasets, the AD-SSESN has the minimum NRMSE error and the best prediction performance. To sum up, the clustering performance of the reservoir is optimized, and the ACC of network is improved in the AD-SSESN. In addition, the ''echo state'' of the AD-SSESN is also significantly enhanced because its spectral radius is extended, so that the highly complex nonlinear dynamic systems can be fitted by the AD-SSESN.

Conclusion
This article proposes a nonlinear time series prediction algorithm based on the AD-SSNET, which improves the prediction accuracy of the prediction model to nonlinear time series data and brings more possibilities for the analysis of a large number of time series in the wireless Internet of Things. The number of the optimal cluster heads is obtained adaptively and its clustering schemes are optimized by the AIC-DSCA, then the AD-SSNET with small-world scale-free characteristics is constructed by the intra-cluster priority connection algorithm. This network is used as a reservoir to construct the AD-SSESN prediction model. Finally, the AD-SSESN prediction model is used to predict MG datasets and Lorenz datasets, respectively. Experimental results show that the NRMSE error of AD-SSESN is the minimum compared with other small-world scale-free network prediction models with different clustering schemes; the NRMSE error of AD-SSESN is also the minimum compared with the other three prediction models with different reservoir networks. The above results show that the highly complex nonlinear dynamic system is more accurately approximated. The prediction accuracy is steadily improved due to the optimized clustering performance of the reservoir, the ACC of the network is improved, and the ''echo state'' is enhanced significantly in the AD-SSESN.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.