DI-GEP: A New Lifetime Extending Algorithm for Target Tracking in Wireless Sensor Networks

Wireless sensor networks (WSNs) are widely used in detecting, locating, and tracking moving objects. The cheap, low-powered, and energy-limited sensors that are set up in large areas may consume large portion of energy and disable the whole network. In this paper, a new energy-efficient method based on Distributed Incremental Gene Expression Programming (DI-GEP) is proposed to collaboratively mine moving patterns of moving targets in order to turn on/off some sensor nodes at certain time to save energy further. Meanwhile, an adjustable sliding window is designed to quickly train the latest collected location data in order to improve the efficiency of DI-GEP. The simulation results show that the proposed method effectively prolongs the network lifetime by around 25% compared with the EKF and ECPA.


Introduction
Recent advances in low-power micro-electro-mechanical system (MEMS) technology, wireless communications, and digital electronics have made it possible to design and develop highly integrated, yet low-cost, low-power, multifunctional microsensor nodes, with the capabilities of sensing, processing, and wireless communications. Once deployed in a certain region, the wireless sensor networks (WSN), composed of thousands of sensor nodes, can work for several years. Through cooperative processing of these sensor nodes, WSNs work in many areas, for example, civil, military, health, and so on. For example, WSNs can be deployed in a hospital to track and monitor patients to remotely collect the physiological data of a patient continuously. Unlike traditional networks, WSNs are self-organized, application specific, and data centric [1].
A wireless sensor node is typically battery operated. Thus, the most important constraint in WSNs is the low energy consumption requirement among their sensor nodes. Sensor nodes carry limited, generally irreplaceable, batterypower sources. So, WSNs must focus primarily on power conservation and provide inbuilt trade-off mechanisms that give end users the chance of prolonging network lifetime at the cost of high quality of service (QoS). Sensor nodes may fail due to energy depletion and lead to network failure. So it is very important for WSN to operate energy efficiently. Raising research interest promotes us to develop energyefficient protocols or algorithms for WSNs.
Target tracking is an important application in terms of WSNs [2]. Bayesian Network and Kalman filtering are two classical methods for achieving this task. One possible solution is as follows. The system state includes the position, direction, and velocity of the target. At each step, sensors near to the target form clusters and select a leader to perform the Kalman filtering, and the updated state is forwarded to cluster leaders chosen from the next step. The Kalman filtering implementation is straightforward in a centralized environment. But it is difficult in the extremely distributed environments such as WSNs due to the energy constraints and lower computation capability of sensor nodes.
The target tracking applications in WSNs are always limited by the inherent energy constraints of sensor nodes, aiming to improve the energy efficiency in target tracking applications in WSNs; the paper proposes a new scheme based on Gene Expression Programming (GEP) [3]. GEP is also adapted to fit for distributed environment. GEP works well in modeling the moving patterns of targets without 2 International Journal of Distributed Sensor Networks aprior knowledge. Based on the historical location information of the target, GEP automatically evolves a trajectory of a moving object. To handle the problem, this paper makes the following contributions.
(1) A new algorithm named Distributed Incremental Gene Expression Programming (DI-GEP) is proposed to mine moving patterns of targets. The basic idea is that DI-GEP runs at multiple collaboratively working sensor nodes to mine the trajectory of a target.
(2) An adjustable sliding window is adopted to ensure that distributed GEP can quickly train the latest collected location data. When new location data are received, old location data are discarded when prediction error exceeds a certain threshold, which is defined and can be calculated by (7). The policy ensures that succeeding evolutions can energy efficiently find latest moving patterns.
(3) Extensive simulations are conducted on OMNet++, a discrete event simulator, to show that new algorithms effectively prolong the network lifetime by about 25% in average when compared to other algorithms, that is, EKF and ECPA.
The rest of the paper is organized as follows. Section 2 presents the related work on energy-saving algorithms. Section 3 gives the preliminaries including GEP-related and target tracking. Section 4 introduces the target diction model. Section 5 formally defines the problems. Section 6 proposes the main algorithms in our scheme. Section 7 gives the experimental analysis. Section 8 concludes this paper and gives the future research directions.

Related Work
There are many research efforts on target detection and tracking in terms of WSNs, which describes several aspects of collaborative signal processing [2,4,5] and real-time application for biologists to find the presence of individuals [6]. A set of approaches presented in [7][8][9] were proposed recently to solve the target localization and tracking problem with proximity binary sensors, which transmit only one bit information to indicate whether a target is present. The information transmitted among sensor nodes was greatly reduced, while the localization error was increased. Shrivastava et al. [8] proved that the accuracy in tracking a target is of the order of ρ * R, where R is the sensing radius and ρ is the sensor density, which articulates the common intuition that, for a fixed sensing radius, the accuracy improves linearly with an increasing sensor density, which shows that, for a fixed number of sensors, the accuracy improves linearly with an increase in the sensing radius. Dai et al. present a light weight target tracking method based on densely distributed sensor networks [10,11] and also propose a new node deployment policy for target tracing applications in WSNs [12] to further improve target tracking performance and quality including accuracy, network lifetime, energy consumption level, trends analysis, and so forth.
The lifetime of a WSN depends greatly on power consumption from each sensor node. Energy-efficient algorithms, protocols, and node hardware and software designing technologies can help prolong the lifetime of the network. Several approaches have been proposed at hardware and software levels to design energy efficient CPU, OS, algorithms, and communication protocols [1]. Dynamic power management (DPM) schemes have been proposed in [13][14][15] to reduce the power consumption by selectively turning off idle components, such as radio frequency (RF) transmitter, RF receiver, sensing device, A/D converter, and the sensor node.
Target tracking applications are special and have their own characteristics. It is unnecessary to turn on all sensor nodes because an object only appears at certain time and place. It is feasible to turn off some idle nodes if we can predict the time and place where the object will appear. Classical target tracking algorithms such as Bayesian Network and Kalman filtering cannot be directly used to predict moving patterns of targets in WSNs due to resource limitations.
To track moving targets energy efficiently, Allegretti et al. [16] proposed a solution based on CA (Cellular Automata) to reduce long distance communications among nodes because of its locally data exchanging scheme, but a higher power consumption is introduced because it cannot turn off those nodes that are far away from the moving object. Qing et al. [17] proposed ECPA (Enhanced Closest Point Approach) to predict the location of targets during the phase of moving, but the velocity and direction calculation algorithms with regard to the targets are computation intensive for sensor nodes, which often have low power and computation capability.

Introduction of Gene Expression Programming. Gene
Expression Programming (GEP) was proposed by Ferreira in 2006 [3]. As a new member of Evolutionary Algorithm (EA) family, GEP is widely applied in data mining areas, that is, function finding, classification, association rule mining, time series prediction, parameter optimization, and digital circuit design, and so forth. In GEP, Genotype (Chromosome) and Phenotype (Expression Tree (ET)) are separated. Without prior knowledge, GEP automatically evolves over training data and discovers knowledge as mathematical formula depicting movement patterns of moving objects in WSNs.
In GEP, an individual, that is, a solution corresponding to a problem, is represented as linear fixed-length string named chromosome. It contains one or more genes. Each gene is decoded into a nonlinear expression tree (ET). Decoded ETs are linked together by prespecified linking function symbols such as plus (+) and minus (−). One chromosome represents one formula that is, the solution to a specific problem. Genetic operations are applied on chromosomes, that is, genotype and selection operations are performed on ETs.
A gene in GEP consists of head and tail. The head contains symbols from either function symbol set (F) or terminal symbol set (T) and the tail only contains symbols International Journal of Distributed Sensor Networks  from terminal symbol set. The tail length satisfies (1), which guarantees that a gene can be decoded into a valid ET In (1), n is the maximum parametric number of function in F, h is head length, and t is tail length. Example 1 below shows a 2-gene chromosome in GEP and its ETs.  Figure 1 shows the 2-gene chromosome, two sub-ETs, which are linked by the linking function "plus," denoted as "+" and the mathematical function obtained from the linked expression tree.

Target Tracking in WSNs.
There are several target tracking methods that adopt distinct models. Kalman filtering method requires that the velocity, direction, and acceleration of a moving object are given to predict the next location of a moving object. But it is impossible for WSNs to equip each node with these devices due to its lower cost and lower power supply. It is critically important to design energy-efficient target tracking algorithms for WSNs.
The tracking model is described in Figure 2. The monitored region is covered with sensor nodes that are distributed manually or randomly. A target moves along a trajectory that is unknown before and is detected by some sensor nodes that are depicted as black solid circle nodes. Suppose that the location data the moving object passed are known at time t 0 , t 1 , t 2 , . . . , t i . A question arises: where will it appear at time t i+1 , t i+2 , . . .?
Once the trajectory of a moving object is found and expressed as a mathematical function, then the following tasks are straightforward.
(i) Predicting the locations that the moving object will appear at time t i+1 , t i+2 , . . .. (ii) Activating the necessary sensor nodes and turning off unrelated sensors to save energy.
Target-tracking applications need careful consideration of trade-off between tracking error and energy consumption. The tracking error is defined by the average target location estimation error of sensor nodes. The better trajectory leads to a better target location estimation. To achieve the above goals, we propose three policies.
(1) DI-GEP, that is, a trajectory discovery algorithm based on distributed incremental gene expression programming. It mines the trajectory in order to reduce tracking errors. (2) Node scheduling algorithm. It activates the necessary nodes and turn off unrelated sensors at certain time in future to save energy.
(3) Sliding window strategy is used to improve performance of DI-GEP.
The experiments in Section 7 will demonstrate the effectiveness and efficiency of proposed methods.

Target Detection Model
Sensor nodes receive the physical signal and convert it to electrical signal. Based on the variation of electrical signal, sensors can detect the existence of target. To describe the model formally, we make the following three assumptions. A 2 : There is a single target anytime. Note that, by Assumption (A 2 ), a present target is depicted by (H 1 ), and an absent target is represented by (H 0 ). The criterion is based on the following formulations [18]: where w i is the obtained signal by sensor s i . Several physical signals such as sound and electromagnetic wave have signal strength decaying according to the power law, and the noise is represented by n i .
A 3 (borrowed from [19]): Le w t is the power emitted by target where d 0 is determined by the target shape and size which is set to be small enough and satisfies d i > d 0 , where d i is the distance between the target and sensor node x i and k is the decaying factor which is set to 2-5 according to different physical signals and its environment.
This study adopts a practical target detection model shown in Figure 3, which satisfies where r l is the lower bound (LB) of sensing range, p i is the probability of a target detected by the sensor node s i , and r u is the upper bound (UB) sensing range.

Problem Formulation
A trajectory of a moving object is treated as a sequence of time-stamped locations that are collected by sensor nodes around the target. It is described as follows.
Definition 2 (Trajectory). A trajectory of a moving object is a time sequence with time interval Δt: International Journal of Distributed Sensor Networks where for all i ∈ [0, n], t i < t i+1 , t i+1 = t i + Δt (x i , y i ) is 2D points that represent locations of the target appeared at time t i and Δt is used to sample the locations collected by sensors to improve our algorithms energy efficiency as well as performance. Because the location data may be of large scale, which will put great burden on sensors and exhausts a great of energy because of huge amount of computations and communications. P(t) describes a varying kinds of trajectories, that is, line segments, quadric curves, cubic curves, and splines. Once P(t) is obtained, it is easy to achieve single-step or multiplestep location predictions. P(t) can be obtained by trajectory mining algorithm. In terms of target tracking applications, there are several unnecessary historical location data during evolving process in distributed GEP. To deal with this problem, we adopt a sliding window prediction method (SWP) to load the latest historical data to train trajectories. The basic idea of SWP is given below.

(a) Find a formula P(t) = [ f (t), g(t)] from h samples and predict the location at time instant m, (m > n−1)
by (6). Example 3 illustrates the phases of evolutions of trajectories.
(b) During evolving process, the size of sliding window h determines how many historical location data are used. The smaller h leads to less energy consumption and faster convergence speed. h is adjusted based on the location prediction error ε, geometric distance between the prediction value and the real measurement. ε should be as small as possible and is calculated by ET: Formula: In target-tracking applications, the trade-off between the energy consumption and the prediction accuracy is balanced by adjusting ε to satisfy different application requirements. In densely distributed sensor networks, tracking-tolerant environment or fast response time tracking areas, ε can be set to a bigger value to save energy. But, in some areas with higher tracking accuracy with slow moving targets environment, ε can be set to a smaller one. In sum, ε cannot be set to zero since the estimated trajectory would always deviate from the actual path targets passed.
Example 3. The historical locations are listed in Table 1.

Fitness Evaluation of Individual.
In evolutionary computations, fitness functions and selection environments are the two very important faces of fitness and are, therefore, intricately connected. When we speak of the fitness of an individual, on the one hand, it is always relative to a particular environment and, on the other, it is also relative to the measure (the fitness function) we are using to evaluate 6 International Journal of Distributed Sensor Networks them. Consequently, the success of a problem not only depends on the way the fitness function is designed but also on the quality of the selection environment [3].
Combining the fitness evaluation and prediction error, DI-GEP calculates the fitness of each individual in distinct populations by where ε i is the evaluation error of the ith location data and E i is the fitness value of the ith individual.

Trajectory Mining Algorithm.
The main steps of trajectory search algorithms are given below.
(1) Sensor nodes are activated based on the node scheduling algorithm.
(2) Communications occur among sensor nodes when one node succeeds in obtaining a trajectory and notifies other nodes.
(3) Other nodes stop running their algorithms and obtain the trajectory to predict future location of the moving object. (1) The maximum number of generations is reached.
(3) One or more other nodes send stopping signal to the node.
(4) The node succeeds in obtaining a trajectory.
The implementation of DI-GEP is described in Algorithm 1.
It mines a trajectory represented as individual in DI-GEP.

Location Prediction and Node
Scheduling. Once a trajectory is found, the model uses it to predict the location where the target will appear as at time t i+ j by (6), where 0 < j ≤ L and L is the prediction length that is used in single-step or multistep predictions. To reduce computational cost, we do not use a circle but a square to select nodes around P(t i+1 ). If sensor node s k (x k , y k ) satisfies (9), then it should be selected and activated at time t i+ j to detect the target The node scheduling algorithm is given in Algorithm 2.

Incremental Evolution Strategy.
In real-world practice, a trajectory of a target is very complex and variable. Thus, to improve the performance of DI-GEP, the key steps of our policy are as follows.  Input: settings and historic location data Output: one individual representing a trajectory or null if no trajectory is found (1) load historic location data of size h and initial configuration (2) randomly create an initial population (3) decode each chromosome into one ET (4) calculate each chromosome's fitness by (8).  (1) Trajectory is described as a curve. Any complex curve can be spitted into multiple simpler curves that are described as line segments, quadric curves, or cubic curves. This method can not only ensure the flexibility of modeling the trajectory but also guarantee less computation cost.
(2) To capture the variation of a trajectory and accelerate the evolving process, sliding window policy is Input: functions P(t) found in Algorithm 1 and prediction length L. Output: activated sensor nodes (1) for (k = 0; k < L; k + +){ (2) for each (sensor node s j (x j , and g(t i+k+1 ) − R y j and y j g(t i+k+1 ) + R) (5) S i is scheduled awake at t i+k+1 (6) else (7) S i is scheduled asleep at t i+k+1 (8) }} Algorithm 2: Node scheduling. Figure 6: Sliding window.
proposed to keep a certain number of latest historical data during individual evolving.
(3) When the obtained function cannot represent the current moving behaviors, that is, the prediction error is greater than a certain threshold; distributed GEP and the node scheduling algorithm should run again with location data in sliding window.
We assume that a trajectory function P(t) is obtained through historical location data sampled at t i , t i−1 , . . . , t i−h−1 . These data fall in the sliding window w 1 described in Figure 6. P(t) works well in predicting the future location during the time interval [t i+1 , t j−1 ], that is, ε ≤ τ. Suppose that at time t j , P(t) does not work well because ε > τ, the trajectory should be recalculated as follows.
(1) The sliding window moves to w 2 to include the latest location data. The process can be simplified by w 1 sliding right when prediction is performed to reduce memory usage.
(2) DI-GEP and the node scheduling algorithm run again.
The performance of these algorithms is evaluated on OMNet ++ and Castalia. In order to compare the performance with other target tracking algorithms, we evaluate DI-GEP, ECPA, and extended Kalman filtering (EKF). The target cannot be found until the network fails, and the tracking task stops simultaneously.  (3), with parameters d 0 = 0.15 and w t = 15d −k 0 . The prediction accuracy τ is 2 meters and h = 7.
The sensing range r o of sensor nodes is set to 7 and r 1 is set to 10. The sensing energy e s is 100 uJ, and transmission (receiving and sending) energy for one packet is e t = e r = 100 uJ. The initial energy of each sensor node is 100 mJ. Parameters used in distributed GEP are listed in Table 2.

Network Lifetime.
Suppose that different number of sensor nodes are uniformly distributed in the grid network, this experiment analyses the impact of the number of sensor nodes on the network lifetime. The results are given in Figure 7. It shows that the three algorithms often obtain longer network lifetime when the number of sensor nodes gets bigger. The network lifetime of DI-GEP is averagely 35% longer than EKF and ECPA.

Energy Consumption.
In this experiment, four hundred sensor nodes are used to monitor the grid network. The sensor nodes are manually distributed at cross points in the grid network.
This experiment analyzes the energy consumption of three algorithms. The results are given in Figure 8. The results show that the total left energy decreases when the time passes. The total left energy cannot be zero because all these algorithms are invalid when the network fails. Note that, some sensor nodes consume their energy and cannot communicate with other nodes any more. Meanwhile, DI-GEP performs better than EKF and ECPA. This is because it consumes less energy than EKF and ECPA after running the same time.

Active
Nodes. This experiment uses the similar setting as given in the previous section and testifies the influence of the number of the active nodes as shown in Figure 9. DI-GEP outperforms EKF and ECPA in node scheduling because of its better trajectory prediction, so the number of active nodes in DI-GEP is about 25, 30% less than that in EKF and ECPA, separately.

Prediction
Accuracy. This experiment uses the same setting as given in previous section and will testify that the prediction accuracy can heavily affect the network lifetime. The results are shown in Figure 10. Distributed GEP outperforms EKF and ECPA because of its better trajectory prediction, so at the same prediction accuracy, the network lifetime is averagely 28% longer than that in EKF and ECPA.

Conclusions
In order to track targets energy-efficiently in WSNs, we presented a distributed incremental algorithm based on GEP for target tracking applications in WSNs, proposed sliding window policy for distributed GEP to improve evolution process, proposed a new target tracking model, and give extensive experimental results to show the good performance of our method.
The future work includes (a) optimize DI-GEP to capture abrupt moving behaviors, (b) optimize DI-GEP to suit randomly distributed wireless sensor networks, and (c) consider border intrusion detection to save more energy in the initial state.