Energy efficient prediction clustering algorithm for multilevel heterogeneous wireless sensor networks

In designing wireless sensor networks, it is important to reduce energy dissipation and prolong network lifetime. In this paper, a new model with energy and monitored objects heterogeneity is proposed for heterogeneous wireless sensor networks. We put forward an energy-efficient prediction clustering algorithm, which is adaptive to the heterogeneous model. This algorithm enables the nodes to select the cluster head according to factors such as energy and communication cost, thus the nodes with higher residual energy have higher probability to become a cluster head than those with lower residual energy, so that the network energy can be dissipated uniformly. In order to reduce energy consumption when broadcasting in clustering phase and prolong network lifetime, an energy consumption prediction model is established for regular data acquisition nodes. Simulation results show that compared with current clustering algorithms, this algorithm can achieve longer sensor network lifetime, higher energy efficiency and superior network monitoring quality.

Where, r is the current number of rounds, G is a set of cluster head nodes which fail to make cluster heads in the latest round When cluster heads are chosen, all these head nodes broadcast this message to other nodes. According to the strength of received messages, nodes determine which cluster head they would join and inform the corresponding cluster head. Based on TDMA approach, cluster heads allocate time slot to cluster members and the networks proceeds into a stable phase, in which each node sends the monitored data back to the cluster head node in the corresponding slot and the cluster head node transfers the received data to the Base Station (BS) after aggregation. So far, one round comes to the end and starts the next. In this way, each node has opportunity to become a cluster head node dissipating more energy.
However, LEACH has some constraints, including: (1) it does not take into account the optimization of the number of cluster heads. The probability for a random node to become a cluster head is p , and therefore the number of cluster heads is proportional to the number of nodes; (2) as cluster heads are randomly selected, and therefore LEACH can not guarantee cluster heads are uniformly distributed in the networks. Meanwhile, the probability threshold does not take into account the energy factor. LEACH algorithm therefore must base itself on two assumptions so as to achieve uniform energy consumption at per node: (1) the initial energy of each node is equal; (2) the energy consumed at each node when acting as the cluster head is equal. Therefore, it is difficult to apply LEACH algorithm to an actual networks application.
Many researchers have done profound work probing into HWSN. In [10], authors improved LEACH algorithm and put forward an algorithm of electing cluster heads according to the residual energy. However, each node needs to know the total energy of the current network to determine whether it can become the cluster head, which requires support of routing protocols and therefore distributed implementation is difficult to achieve. This algorithm is called LEACH-C. SEP [11] is designed for two-level heterogeneous networks in which there are merely two kinds of nodes with different initial energies. But in the multi-level heterogeneous networks, nodes' initial energy is randomly determined within a certain range, so SEP does not suit for such a heterogeneous environment. For further researches, a heterogeneous network model in term of different initial energies is discussed in [6,[12][13][14]. In [15], the authors introduced a cluster head election method using fuzzy logic to overcome the defects of LEACH. They investigated that the network lifetime can be prolonged by using fuzzy variables.
In [16], authors proposed EEHC protocol. This protocol selects cluster heads based on the weighted probability of each node related to the initial energy, the more initial energy, the higher probability the node will be selected as a cluster head. However, this protocol can not predict energy consumption, so its performance is limited in heterogeneous networks in which part of nodes are regular data acquisition nodes.
In [17], authors proposed EDFCM protocol, which applies to networks with three different if i G  otherwise kinds of heterogeneous nodes. Nodes in the networks model of this protocol fall into two ordinary types: one performing the function of managing information and the other collecting different data(type_0, type_1). type_1 have more complex hardware and software architectures, so it has more initial energy and greater data transfer capability. To guarantee an optimum number of cluster heads selected in actual operations, authors propose a stable selection and reliable transmission protocol based on a method of energy dissipation forecast and clustering management. But the application of this protocol is limited to the networks with only two types of ordinary nodes.
In [18], authors proposed ERP clustering routing protocol for HWSN. In this paper, an evolutionary algorithm with an appropriate fitness function is proposed with the intrinsic properties of clustering in mind. Main idea of the proposed ERP is the incorporation of compactness and separation error criteria in the fitness function to direct the search into promising solutions. Against LEACH and HCR, ERP can prolong network lifetime and stability period. However, compared with SEP, ERP gains longer network lifetime, but at the expense of less stability awareness.

Heterogeneous model for wireless sensor networks
To meet the demands of efficient environmental monitoring, we describe our HWSN model with both different initial energies and monitored objects. The basic assumptions of networks model: the networks is located in a M × M square area (Fig. 1), N sensor nodes are randomly distributed within the networks, nodes are slightly mobile or stationary, and base station is located in the middle of the area. The networks perform the task of environmental monitoring and sensor nodes monitor a variety of objects. Define nodes monitoring temperature, humidity, wind direction etc. as regular data acquisition (RDA) nodes; these nodes send back messages of fixed length at a fixed interval; nodes monitoring fire are not regular in acquiring data and the messages sent back are not regular. Therefore, nodes are heterogeneous in two ways: (1) heterogeneous data-acquisition -regularity: some nodes are regular in acquiring data and some are not. All regular nodes send 1 2 n n times messages in a rotation cycle times and the message sizes are between   1 2 , l l bits; (2) the initial energy of all nodes are heterogeneous.
Nodes communication links are symmetric and nodes do not have any location information, but they can calculate the distance between nodes according to signal strength received. Nodes in the networks are organized in the form of clusters. Cluster heads perform the function of data fusion and are responsible for the resultant data transmission to the BS. There is only one BS in the networks and wireless transmission power is controllable.
Node initial energy is randomly distributed in the closed interval min max [ , ] E E , where min E is the lower bound of the energy, max E determines the value of maximum node initial energy.
For any node i , its initial energy is i E .

Energy Models
This article applies a simple energy consumption model [10] to calculate energy consumption in communication, ignoring energy consumption of nodes in the process of computing, storage, etc. In the process of transmitting l bits message through distance d , the energy consumption of the transmitter is: Where elec E is the energy dissipated per bit to run the transmitter or the receiver circuit, and 2 fs d  and 4 mp d  are the amplifier energy that depend on the transmitter amplifier model.

Problem Description
Essentially, all WSN clustering algorithms are intended to solve the problem of unbalanced networks load, and to achieve uniform distribution of energy dissipation at all nodes, so as to prolong the network lifetime as much as possible. Therefore, EEPCA must take full account of the following: (1) algorithm should be fully distributive and self-organized. Nodes determine their own state based only on local information, and each node must decide whether to become a cluster head or a member belonging to a cluster in the clustering phase [10]; (2) nodes with more residual energy must have higher probability to become cluster head and it must be ensured that the cluster has a smaller communication cost, but energy is not the only factor for cluster head selection; (3) cluster load balancing must be ensured; (4) EEPCA operates in rounds. In order to save energy consumption when nodes broadcast in initial clustering phase of each round, an energy prediction model of RDA nodes is established.

Calculation of distance between Nodes
Nodes in the networks can perceive their mutual distance according to attenuation of signal strength in the process of transmission. In clustering phase, all nodes use certain transmission energy for broadcast. For instance, with energy tran i E , node i broadcasts information to other nodes, including its message sending cycle i t , message length i l and its energy information i E .
Node j detects the received signal strength (received energy) , rec j i E while receiving messages.
The relationship between transmission energy and reception energy is as follows [20]: , , i j d  is the relative distance between node i and node j .  is distance -energy gradient, and its value varies from 1 to 6 according to the physical environment in which the sensor networks operate. Thus, the distance between i and j is: , , The node establish a routing table of neighboring nodes based on received data and save all relevant information of all nodes within its communication range. All nodes in the networks are marked by the only integer value, which is each node's ID. The information stored in the routing table includes the distance between the node and its neighboring nodes, cluster head node's ID, the distance to the cluster head, the current energy and predicted energy consumption.

Cluster head selection
The cluster head node has to perform extra functions such as data fusion and relaying messages, so its energy consumption rate is much higher than that of ordinary nodes. In order to prevent some nodes from dying too soon due to excessive energy cost, the nodes with more residual energy should be given greater opportunity to become cluster heads and all nodes take their turns to be cluster head nodes.
Set opt p is the proportion of optimal cluster heads and i p is the probability for node i to be selected as the cluster head. Obviously, if the current energy at all nodes is equal to each other, can ensure that all nodes die at the same time. In energy-heterogeneous WSN, i p calculation is much more complicated. Currently, many clustering algorithms in HWSN determine i p by using the ratio of nodes' current residual energy and the average energy of the entire networks, but the latter is very difficult to obtain [13], especially for networks in which different nodes are monitoring different objects. Consequently, major error is likely to happen to the estimated average energy.
Ideally, nodes are distributed uniformly and send back data at identical frequency and length.
Set toBS d is the average distance between the head node and the BS and toCH d is the average distance between member nodes in a cluster and the head node, it can be concluded that [10,21]: The number of optimal cluster heads is [13]: Therefore, the proportion of optimal cluster heads is: In the initial stage of clustering, through broadcast among nodes in the networks, for any node i , there are a total of n nodes within its communication range, of which the distance On an ideal occasion that nodes in the networks are uniformly distributed and every data transmission send data identical in length l , the number of nodes in each cluster is Therefore, the number of these two types of nodes is: The random distribution of nodes can be viewed as a Poisson point process [21].
A circle can be obtain after any radius revolves around the center, so consider the distribution of points on a random radius. Points are distributed uniformly in the circle, and accordingly, the density of points is proportional to radius squared. Therefore, the probability density of points on a random radius is: Where R is radius length. Therefore, the calculation of   i E d can be simplified to: By formula (16) and (17), the average distance expectation of nodes whose distance to the cluster head is less than 0 d is: The average distance expectation of nodes whose distance to the cluster head is more than 0 d is: Therefore, ideally the average energy consumption within one data transmission in the cluster is By formula (11) and (20), communication cost factor   i C  which has influence on probability of cluster head election is: Integrating node energy factor and communication cost factor, the following formula can be used to calculate the probability for node i to become the cluster head node: Where  and  are the calculation factors regulating the proportion of energy factor and communication cost factor in calculation i The constraints of LEACH threshold formula   T i should be improved in two steps: (1) to promote   T i into multi-level heterogeneous networks; (2) in EEPCA, to take energy factor and the communication cost factors into account and to improve calculation method of   T i , as is shown in formula (23): Where s r is the number of rounds when a node fails to be selected as the cluster head. Once the node elected, s r is reset to 0.

energy consumption prediction mechanism
Obviously, after the networks complete a round, a new node need to be selected as the cluster head. Because it is necessary to re-evaluate the energy factor and the communication cost factor so as to determine the probability for the node to become the cluster head, the current node residual energy must be obtained. The easiest way is that all nodes in the networks carry out a broadcast through the method utilized in the first round of clustering. However, considerable energy will be consumed when broadcasting in each round of clustering, so this paper establishes an energy consumption prediction mechanism for RDA nodes. In 1 r  round, it takes j n times for any node j to send messages with a length j l to cluster head node i and the distance between i and j is , Due to reasons such as networks environment changes, when r round starts, all nodes need to be re-clustered and new cluster head node need to be elected. Node j determines whether its current residual energy is close to the residual energy predicted in the last round or not.
If  is less than constant  , the energy predication error can be tolerated. In the initial phase of r round, node j does not broadcast its energy information and the remaining nodes update node j ' energy information in the routing table according to calculation results.

establishment of simulation environment
Through simulation experiment, this paper makes analysis and comparison on the performance of EEPCA. The experiment simulates a high density sensor network for environmental monitoring randomly formed within a 100 100 m m  area. After the formation, nodes become static, no longer moving. And 100 sensor nodes are randomly distributed in this area, without loss of generality. Assuming the BS is located in the center of the area. In order to compare with other protocols, impact caused by random factors such as signal collision and wireless channel interference is ignored. Parameters used in this experiment can be seen in Table 1. This paper will compare the performance of EEPCA and that of LEACH, SEP and EDFCM. All results, unless otherwise stated, are average values of 100 times independent experiments.  When parts of nodes in the network die, nodes density becomes significantly lower and due to the reduction of nodes number, network load is more likely to be uneven. Therefore, greater value of the communication cost factor  can help improve algorithm performance. In subsequent experiments, the values of  and  are unified as 0.7 and 0.3.
In the above experimental environment, EEPCA and LEACH, SEP and EDFCM will be compared and tested to analyze EEPCA cluster head selection mechanism's impact on the algorithm performance when all nodes are heterogeneous. The simulation results in Fig. 3 show the variation of the number of dead nodes over time in the above experimental environment in different algorithms. It can be seen in Fig. 3, LEACH can not make good use of the additional energy of heterogeneous nodes, the stable period is very short and nodes die at a fixed speed rate. Compared with LEACH, SEP has longer stable periods. EEPCA and EDFCM curves are lines with smaller slope versus X-axis. Because EEPCA distributes energy consumption uniformly on each node in the heterogeneous network, the death time of the first and the last node is relatively closer. It can be seen from Fig. 3, compared with LEACH and SEP, EEPCA can prolong network life expectancy by 129% and 55%.
In the above experimental environment, change the proportion of heterogeneous nodes in the total number of nodes and observe the performance of each algorithm. Fig. 4 presents the number of rounds from the beginning to the death of the first node when the proportion of heterogeneous nodes varies from 0 to 100%. In this experiment, the initial energy of all non-energy-heterogeneous nodes is 2J. Before the death of 10% nodes, the network can send back to the BS data of high quality and reliability [13]. So Fig. 5 presents the number of rounds from the beginning to the death of 10% nodes, namely the stable period. It can be seen that as LEACH is not a clustering algorithm for heterogeneous networks, it does not take into account the energy difference between nodes and instead, all nodes are treated equally. Therefore, in LEACH，with the increase of the proportion of heterogeneous nodes, attainable network stable period quickly reduces. SEP can obtain 25% more stable period than LEACH, which is basically consistent with the experimental results presented by [11]. As EDFCM takes into account heterogeneous energy of different nodes, the death time of its first node is later than SEP and it gets longer stable period than SEP. EEPCA takes into account the energy consumption of nodes in the communication process in addition to residual energy, so the decline rate of stable period is significantly less than other algorithms in the process of increasing proportion of heterogeneous nodes. Therefore, with greater proportion of heterogeneous nodes, a more stable period is obtained.
To go further, RDA nodes are introduced into the experiment. Set all nodes energy in the networks is heterogeneous and 50% nodes are RDA nodes. Meanwhile, because of factors such as changes in the environment, 10% nodes are malfunctioning. All RDA nodes send messages 3-7 times in a round and the sizes of messages are valued randomly between 2000-6000bits. Examine the impact of the constant  on networks stable period. Due to malfunctioning nodes in the network, errors in energy prediction are inevitable. If 1   , nodes broadcast their energy information when energy prediction errors happen. In this case, it is difficult to achieve substantial savings in energy consumption. If the value of  is too low, nodes do not broadcast their energy information even if biggish errors in actual residual energy and predicted energy happen. In this case, nodes with lower actual residual energy may have higher opportunity to become the cluster head and the length of network stable period is thus affected. Fig. 4  Obviously, due to the introduction of energy consumption prediction mechanism, broadcast frequency in the clustering phase in each round is effectively reduced. Therefore, in a network heterogeneous in two ways ---initial energies and monitored objects, EEPCA makes significant improvement in network stable period compared with the other three algorithms. However, the heterogeneity of EDFCM fails to take into account RDA nodes, so when these nodes are added, the stable period of EDFCM declines considerably. For the algorithm running by round, monitoring quality can be measured by the total times for all nodes in the network to collect data. Fig. 8 shows that all the nodes are energy heterogeneous, 50% are RDA nodes and 10% of the nodes are malfunctioning. In EEPCA, the number of messages received by BS is on linear rise for a long period of time, while in other algorithms, the growth rate of the number of messages received by BS begins to decline earlier. To sum up the total number of messages sent back to BS by all nodes in these four algorithms when the network fails, the amount of data collected by EEPCA is much larger than that by the other three algorithms. Therefore, EEPCA has better network monitoring quality.

Conclusion
In this paper, we describe the HWSN model with both different initial energies and monitored objects. We present an effective energy prediction clustering algorithm EEPCA for multi-level heterogeneous sensor networks. In EEPCA, each node independently selects itself as the cluster head node based on energy factor and communication cost factor, which leads to the probability of cluster head election related to nodes' current residual energy and average communication cost after being selected. At the same time, with the consideration that the WSN are frequently used to monitor objects such as temperature and humidity which need to report data regularly, and the length of reported data are usually fixed, an energy consumption prediction mechanism is established for RDA nodes. Simulation results show that compared with LEACH, SEP and EDFCM, EEPCA can achieve longer lifetime, higher energy efficiency and better network monitoring quality. Its performance is superior to other protocols.
In future work, research will further improve residual energy prediction mechanisms so as to achieve greater prediction accuracy and prolong network lifetime to the maximum. In addition, such problems will be considered as message transmission and energy prediction in networks where one node monitors a variety of different objects. Our ultimate goal is to apply EEPCA algorithm to practical use.