Fortifying Intrusion Detection Systems in Dynamic Ad Hoc and Wireless Sensor Networks

We investigate three aspects of dynamicity in ad hoc and wireless sensor networks and their impact on the efficiency of intrusion detection systems (IDSs). The first aspect is magnitude dynamicity, in which the IDS has to efficiently determine whether the changes occurring in the network are due to malicious behaviors or or due to normal changing of user requirements. The second aspect is nature dynamicity that occurs when a malicious node is continuously switching its behavior between normal and anomalous to cause maximum network disruption without being detected by the IDS. The third aspect, named spatiotemporal dynamicity, happens when a malicious node moves out of the IDS range before the latter can make an observation about its behavior. The first aspect is solved by defining a normal profile based on the invariants derived from the normal node behavior. The second aspect is handled by proposing an adaptive reputation fading strategy that allows fast redemption and fast capture of malicious node. The third aspect is solved by estimating the link duration between two nodes in dynamic network topology, which allows choosing the appropriate monitoring period. We provide analytical studies and simulation experiments to demonstrate the efficiency of the proposed solutions.


Introduction
Multihop ad hoc wireless networks are a set of nodes equipped with wireless interfaces, and data are forwarded through multiple nodes to reach the intended destinations.They include many types of networks such as mobile ad hoc networks (MANETs) [1], wireless sensor networks (WSNs) [2], and vehicular ad hoc networks (VANETs) [3].
In the last decade, there has been a substantial research in the area of security in ad hoc and wireless sensor networks [4,5].The security solutions have been designed with the goal of protecting the networks against some attacks such as selective forwarding, black hole, wormhole, sinkhole, and energy exhausting attack.Prevention mechanisms like key management and authentication, which represent the first line of defense, are not sufficient to provide an efficient security solution.Therefore, there is a need to deploy a second line of defence named intrusion detection system (IDS).
In general, intrusion detection systems are divided into two major approaches: misuse detection and anomaly detection [6].Misuse detection performs signature analysis by comparing on-going activities with patterns representing known attacks, and those matched are labeled as intrusive attacks.The misuse approach is showing its limits as it cannot detect new attacks.Anomaly detection, on the other hand, builds profile of normal behavior and attempts to identify the patterns or activities that deviate from the normal profile.The main advantage of anomaly detection is that it can detect unknown attacks.
The detection model that we consider in the ad hoc and wireless sensor network is as follows.The IDS is implemented in a distributed manner; each node can act as a monitoring node that observes the behavior of its neighbors.Each observation lasts for a monitoring time interval of duration Δ, called the monitoring period.The IDS can judge whether the monitored node is normal or anomalous after one or multiple consecutive observations.Although intrusion detection systems have received considerable attention in ad hoc and wireless sensor networks [7,8], to the best of our knowledge, there are no studies on the impact of network dynamicity on IDS efficiency and how the IDS can react or adapt to these changes.
In this paper, we investigate the following three aspects of behavioral dynamicity that occur in the network and can negatively affect the IDS performance and efficiency.
(i) Magnitude Dynamicity.Due to change of user requirements, a node changes the rate at which it generates data.For instance, a legitimate user wants to change (i.e., increase/decrease) the data collection rate received at the sink node.The challenge facing the IDS here is to be efficient at detecting attacks and distinguishing between the changes due to normal behaviors and the changes due to malicious attacks.
(ii) Nature Dynamicity.In some detection models, a monitoring node has to observe the behavior of the monitored node during a set of consecutive monitoring periods before judging whether the monitored node is malicious or not.A monitored node might evade from IDS detection and confuses it by switching continuously its behavior between normal and anomalous.In this case, the malicious node strives to cause network disruption without being detected by the IDS.
(iii) Spatiotemporal Dynamicity.The IDS detection mechanism is based on collecting a set of consecutive observations about the monitored node.An IDS is able to observe the behavior of the monitored node if the latter stays within the monitoring node's transmission range for a duration exceeding Δ.By knowing this fact, a malicious node can evade IDS detection by moving around in the network at a speed, which prevents it from being within the monitoring node's transmission range for a duration higher than Δ.
In this paper, we propose a solution for each aspect of dynamicity mentioned above.The contributions of the paper are threefold.Firstly, the magnitude dynamicity aspect is solved by defining a normal profile based on the invariants derived from the normal node behavior.This is achieved by generating a dependency graph consisting of strongly correlated features and then derives the high-level features from the graph.The high-level features are obtained by applying the divide-and-conquer strategy on the maximal cliques algorithm and the maximum weighted spanning tree algorithm.Secondly, to handle nature dynamicity aspect, we adopt the carrot and stick strategy (i.e., reward generously and punish severely) to prevent a malicious node from evading the IDS.To do so, we propose an adaptive reputation fading strategy to allow fast redemption and fast capture of malicious node.Thirdly, we use statistical analysis to estimate the link duration between two nodes in dynamic network topology.Based on this estimation, the monitoring node chooses the appropriate monitoring period, which allows it to observe the monitored node's behavior.
The rest of the paper is organized as follows.In Section 2, we describe the normal profile construction and the feature selection method.Section 3 presents the adaptive reputation fading strategy.In Section 4, we analyze link-node duration in a mobile wireless network and explain how the monitoring time period is estimated.Finally, Section 5 concludes the paper.

Magnitude Dynamicity
2.1.Background 2.1.1.One-Feature Profile.In the one-feature profile, we use a single feature to describe and detect anomalous behavior.To detect the network malicious behavior, a node can measure the following features, as shown in Table 1 [9].The disadvantage of this profile structure is that there is a need to assign one feature for each known attack.In this case, the IDS has to measure each feature and check whether it has anomalous value.When the number of attacks increases, the detection speed of the IDS becomes slow.It also becomes slower when the size of rule set increases.
The one-feature profile might fail at distinguishing between normal and anomalous behaviors.Figure 1 shows that using some features individually to describe normal behavior is misleading and might make the detection system falsely accuse a legitimate node of being malicious.Figure 1(a) depicts a tree-based wireless network rooted at the sink  and it shows the normal traffic rates of the network.The value above each link indicates the flow rate traversing this link.Each node measures the flow rate coming from its upstream neighbors.Figure 1(b) (resp., Figure 1(c)) shows the state of the network when nodes , , and  become compromised and start behaving maliciously by dropping some packets (resp., generating more packets).As , , and  reduce (resp., increase) their sending rate, their respective downstream neighbors  and  have also to reduce (resp., increase) their sending rate accordingly.As a result, node  will falsely accuse nodes  and  of performing selective forwarding attack (resp., energy exhaustion attack), and hence a high false positive rate will be observed.

Multifeature Profile.
In the multifeature profile, we describe the normal behavior by a -feature vector and each element of the vector represents a feature.In this way, the IDS can determine whether some features together show an anomalous behavior.Experiments have shown that we can obtain better detection accuracy by combining related features rather than individually [10].If node  in the example of Figure 1 considers two features: (a) the flow entering the monitored node and (b) the flow leaving the monitored node, it will conclude that nodes  and  are just forwarding what they received from their upstream neighbors, and hence they are not malicious.Loo et al. [11] group the observed data into clusters and use a profile of 12 features to describe normal profile.To check whether a test instance belongs to a given cluster, they measure the Euclidian distance between the test point and the centroid of the cluster.If such a distance is higher than a threshold distance, the test point is considered anomalous.The following example shows that the Euclidian distance between two -feature profiles reduces the detection accuracy.Let ( 1 ,  2 ) be a vector profile such that each feature of the vector is used to detect one attack. 1 and  2 take values in [0, 10].Let (10, 10) be the centroid vector.The first and the second attacks are detected when  1 ≤ 7 and  2 ≤ 6, respectively.We take the distance between (10, 10) and (7,6), which is 5, as the threshold distance.Let a test vector be (6,10); the distance between the two vectors is 4, which is lower than the distance threshold.In this case, the test point will be considered normal whereas the value of  1 individually indicates the occurrence of an attack.The above example shows that aggregating features through the use of Euclidian distance result in loss of detection accuracy.
In [9,12], the normal profile of a monitored node  is defined by a -feature vector   = ( 1 , . . .,   ).If a node monitors a set of  nodes, it forms a matrix  = ( 1 , . . .,   )  .Both schemes assume that all feature vectors   follow the same multivariate normal distribution with mean  and variance-covariance matrix M. Node  is considered suspicious if the Mahalanobis distance between   and the center of the set  is greater than a predefined threshold.The authors of both works use the orthogonalized Gnanadesikan-Kettenring estimation to find the center of the set .Let μ and M denote the simple mean and the simple variancecovariance of  such that μ = (1/) ∑  =1   and M = (1/( − 1)) ∑  =1 (  − μ)(  − μ)  .The Mahalanobis distance between   and the vector μ is given by √ (  − μ)  M−1 (  − μ).The Mahalanobis distance differs from the Euclidian distance in that it takes into account the correlations between features.In [12], nodes are evaluated in terms of packet dropping rate, packet sending rate, forwarding delay time, and node readings.In [9], the attacks are detected by monitoring packet sending rate, packet dropping rate, packet mismatch rate, packet receiving rate, and received signal strength.As stated in [13], the works of [9,12] suffer from two major criticisms: (1) the circumstances, under which the assumption of multivariate normal distribution holds, are not explained, and (2) the network features such as packet sending, packet dropping, and packet receiving rates do not follow the normal distribution for tree-based routing protocol.

Profile Construction Based on Strongly Correlated Features.
When it comes to comparing distances, we find that the Mahalanobis distance is a powerful technique as it takes the covariances into account, which leads to elliptic decision boundaries in the 2D space.While the Euclidean distance builds circular boundaries and considers equal variances of the features, it appears that the Mahalanobis distance is more appropriate for multivariate data.
In our paper, we take a novel approach to select relevant features and construct the normal profile vector.We do not assume multivariate normal distribution and we feed only strongly correlated features to the distance measure unlike the Mahalanobis distance, which considers correlation between all features.
In the training phase, we investigate the significant associations between features.We are interested in identifying the level of correlation between those features, called Pearson's correlation coefficient, which measures the strength of the linear association between features.Pearson's correlation coefficient between two feature vectors  and  is defined by where International Journal of Distributed Sensor Networks direct/increasing linear relationship (resp., inverse/ decreasing).Indeed, strong relationship between variables is reflected by values close to the limits (−1 ≤  ≤ −0.9 or 0.9 ≤  ≤ +1) [14].Pearson's correlation coefficient takes value 0 if we are in presence of independent variables.However, the reverse is not true since this coefficient deals only with figuring out linear dependencies between variables.
The induced graph  [Th] from  might be composed of a set of disjoint connected partitions.The more the Th is close to 1, the stronger the correlations exist in  [Th] .
We aim at finding the set of features that increase and decrease altogether in order to avoid the missed detection problem as in [11].The best way to do so is to extract from  the set of cliques composed of strongly correlated features.One of the widely adopted solutions [15] to compute maximal cliques in an arbitrary graph of  vertices runs in time (3 /3 ) = (1.44 ).Instead of applying the maximal cliques algorithm on graph , we propose to adopt the divide and conquer strategy by applying this algorithm on each connected component of the subgraph  [Th] .A clique CL [Th]    = ( [Th]    ,  [Th]    ) ( ≥ 1) of a graph  [Th] is a set of vertices  [Th]    ⊆  [Th] such that all the pairs of  [Th]    are adjacent.This strategy significantly reduces the computational complexity to find maximal strongly correlated cliques.Let us consider that  [Th] is composed of  vertices belonging to a set of  connected components.Each connected component    = 1 ⋅ ⋅ ⋅  is composed of   vertices.There are  singleton vertices and  partitions with two vertices, and the rest of connected components are composed of more than two vertices.The computational complexity incurred by applying the maximal cliques algorithm on graph  is By applying the same algorithm on each connected partition of  [Th] , we notice that there is no need to apply it on isolated vertices and the partitions of two vertices are cliques by definition, and hence we get the following computational complexity: ∑ :  >2 1.44   .It is obvious that applying the divide and conquer strategy can significantly reduce the running time of the algorithm and make it suitable for resource-constrained nodes.
Let  be the set of edges belonging to all cliques in  [Th] , and || =   .For each edge (  ,   ), which is the th element of  ( = 1, . . .,   ), we define a high-level feature   =   /  .From the training dataset , we derive its high-level training dataset  defined as follows; for each -profile vector   ∈ , we derive its   -profile high-level vector   = ( 1 , . . .,    ), such that   =   /  and   ̸ = 0.If   = 0, the high-level vector   is then removed from the training dataset .This choice is justified by the fact that the stronger the correlation between   and   is, the more the data instances of (  ,   ) fall on the same straight line   =   + , where  is the slope and  is the intercept.
The high-level features belonging to the same clique CL [Th]    are grouped into a single vector   .We consider that  cliques are obtained from  [Th] .Thus, the normal profile is then defined as the set of vectors   ( = 1, . . ., ).To further reduce the number of features in each vector   , we apply the maximum weighted spanning tree algorithm on each clique.To do so, we apply Kruskal's algorithm originally used to obtain the minimum spanning tree by negating the weight of each edge [16].The high-level features, whose edges do not belong to the tree, are removed from the normal profile.The resulted profile is called the minimum normal profile.The time complexity of the maximum weighted spanning tree is As the maximal cliques algorithm, the maximum weighted spanning tree is only applied on cliques with more than two vertices.The use of maximum weighted spanning tree is justified by the fact that all the low-level features of each clique in  [Th] have strong correlation between them.In each clique, if  and  are strongly correlated and  and  are strongly correlated, then  and  are strongly correlated.Hence, we can remove the redundant (, ) edge from the clique.

Proposition 1. For any data set of 𝑑 low-level features, the number of high-level features induced by the graph-based
generation method is upper-bounded by  − , such that  is the number of cliques in  [Th] .
Proof.Consider  [Th] ⊆ ; that is, in the worst case, each low-level feature belongs to a given clique CL [Th]    ( ≥ 1).As a result, ∑  =1 | [Th]    | ≤ .It is known that the number of edges induced by executing the maximum weighted spanning tree on the clique CL [Th] Thus, the number of edges (i.e., highlevel features) induced by executing the maximum weighted spanning tree on all the cliques of  [Th] is upper-bounded by  − .

Detection Process.
Each node constructs its local dataset represented by  ×  matrix (i.e.,  vector instances and  features).It then extracts  cliques from this dataset, as shown above, as well as its minimum profile composed of  vectors   of size   , where  = 1, . . ., .The node computes the centroid vector   for all the  instances of   .
To check whether a profile  is normal or anomalous, we derive from  its corresponding high-level profile  and we execute the pseudocode depicted in Algorithm 1.In the algorithm, Dis denotes the Euclidian distance between two vectors.Low  and Up  denote the lowest and highest values obtained from estimating Dis(  ,   ) for all the  instances of   .

Simulation Results.
We study the performance of the proposed IDS using GloMoSim simulator [17].Each node sends one packet/sec toward the sink.A watchdog is implemented at each node and its role is to monitor the network activities of all the node's neighbors.At every 10 seconds (i.e., one time period), a monitoring node  measures the feature vector of its monitored node .After a training phase of  time periods, testing phase lasts for 1800 seconds.The role of IDS, which is implemented at a node , is not just to detect if 's neighbor (node ) is malicious or not but also to detect if node  is malicious during a given time period.We evaluate the performance of the IDS using two metrics: detection rate and false positive rate.We select the following five quantitative features: (i) number of generated packets (GEN), (ii) number of received packets (RCV), (iii) number of forwarded packets (FWD), (iv) number of sent packets (SENT), (v) number of lost packets (LOSS).
We generate then the correlation matrix Ω as well as the minimum normal profile after performing the maximal cliques algorithm and the maximum weighted spanning tree algorithm as shown in Figure 3: ) . ( Figure 4 shows the detection rate of the proposed IDS as a function of dropping probability.The first observation that we can draw from the figure is that the detection rate is 100% when the dropping probability is higher than 0.05, and it is under 100% when the dropping probability is ≤0.02.This can be explained as follows: under very low dropping probabilities, the malicious nodes drop packets at low intensities and their activities become unnoticeable.This happens when the dropping probability becomes very close to or less than the normal packet loss, which is at most 2% during each time period.Figure 5 shows the detection rate of the IDS as a function of training period.The results are presented under the following levels of dropping probability  = 1, 0.5, 0.1, 0.05, 0.01.The results show that the detection rate does not depend on the training period but on the dropping probability.Under high dropping probabilities, the detection rate is 100% for all the training periods.Under low dropping probabilities, the detection rate decreases as the malicious behavior becomes very close to the normal one. Figure 6 shows the false positive rate of IDS as a function of training period under the following levels of dropping probability  = 0.8, 0.5, 0.1, 0.05, 0.03, 0.01.We can notice International Journal of Distributed Sensor Networks 7 that the false positive becomes 0 when the training period  = 30 for all  > 0.02.At  = 30, the IDS has learned all the possible instances of the normal profile and can accurately distinguish between normal and anomalous traffic.When  < 30, the IDS still has not learned all the instances of the normal profile.In other words, the normal profiles, which are not observed during the training phase, will be considered anomalous during the testing phase.Thus, the false positive rate depends in this case on the number of times unlearned normal profiles are observed during the testing phase, which itself depends on the number of lost packets that are due to (1) normal packet loss and (2) dropping activities.As packet loss is an event that occurs randomly, the false positive curves are also random when  < 30.For  = 0.01, the false positive becomes 0 only when  = 40.Given that the behavior of the malicious node becomes very close to the legitimate node, the IDS needs more time to learn about new instances of the normal profile.

Nature Dynamicity
3.1.Background: Constant Fading Reputation Strategy.Reputation is defined as the general opinion of a society of nodes towards a certain node in a specific domain of interest, and it is the global perception on the future behavior of this node.In the IDS based on multiple observations, the IDS collects a series of consecutive observations, each of which occurs during a separate monitoring period.
Since reputation aggregates past experiences and dynamically evolves, it is similar to Bayesian analysis, which is a statistical procedure that estimates parameters of an underlying distribution based on observations.Starting with prior distribution, which is the initial state before any observation is made, Bayesian analysis continuously takes into account new experiences and derives posterior probability [18].One of the used distributions in Bayesian analysis is Beta distribution.
Beta distribution has been recognized as a useful formal tool to model reputation [18][19][20].A reputation value assumes a tuple of (,  ≥ 1) such that  and  represent positive and negative observations, respectively.
The Beta distribution and its probability density function (PDF) are defined as follows: where 0 ≤  ≤ 1, ,  ≥ 0.
The reputation, denoted by , is defined as the expectation (denoted by E) of the Beta distribution, and it takes the following simple form: We model the reputation of a node with a Beta distribution (, ).Initially,  = 1 and  = 1.
The standard Bayesian procedure is as follows.Initially, the prior is Beta(1, 1), the uniform distribution on [0, 1].Then, when a new observation is made, say with  observed misbehaviors and  observed correct behaviors, the prior is updated according to  := + and  := +.The reputation relies on the node's direct observation.When the monitoring node makes one individual observation about the monitored node, it updates  and  as follows.
(i) If the observation is qualified as misbehavior,  is set to  + 1.
(ii) If the observation is qualified as correct behavior,  is set to  + 1.
The standard Bayesian method is modified in [19] to give less weight to the observations received in the past so as to allow reputation fading and prevent any node from capitalizing on its previous good behavior forever.To achieve this aim, a discount factor for past observations is used.When a new observation (, ) is made,  and  are updated as follows: The weight  is a constant discount factor for past observations, which serves as the fading mechanism.We refer hereafter to the reputation system described above as the constant fading reputation strategy.

Adaptive Fading Reputation Strategy.
The constant fading reputation mechanism uses the same discount factor for all types of observations and during all the time.The higher (resp., lower) the value of  is, the slower (resp., quicker) the histories are forgotten.By knowing the value of , a malicious node can evade from IDS detection by misbehaving for a given time and goes back to normal behavior.Under high discount factor, the change of node behavior (from well-behaved to misbehaved and vice versa) will be detected after a long time.During this time, well-behaved nodes can count on their good histories and act maliciously.In addition, misbehaved nodes will have to wait a longer time to redeem themselves.On the other hand, a low discount factor permits a quicker detection redemption of nodes.However, it might raise false alarms especially when network faults and attacks both share the same failure symptoms.For instance, a misbehavior is detected if the observed node is not forwarding a packet.This rule is set to detect black hole and selective forwarding attacks.In addition, this rule is applied when packets are not forwarded due to collisions, which means that a well-behaved observed node might be falsely considered malicious.
To deal with this issue, we propose an adaptive fading reputation mechanism.This mechanism uses the carrot and stick strategy; that is, reward the well-behaved node and punish the misbehaved node.(ii) Punishment Strategy.It is applied when the reputation  < th.The IDS forgets the positive history more quickly than the negative one (i.e., () < ()); this strategy is used when a node is misbehaved.
Formally, () and () are written as follows: bounds of the positive discount factor, respectively, under punishment strategy.NP max and NP min are the upper and the lower bounds of the negative discount factor, respectively, under punishment strategy.
For new nodes, positive and negative histories are kept with a discount factor equal to 1 when the number of observations is less than a given value named experience threshold.
From the above upper and lower bounds, we define the following two distance metrics.
(i) Punish-to-Reward (PTR) Distance.It is defined by PR min − PP max , and it shows to what extent the node is rewarded by the IDS when it transits from the misbehaved state to the well-behaved state; that is, the higher the PTR is, the slower the positive histories are forgotten.
(ii) Reward-to-Punish (RTP) Distance.It is defined by NP min − NR max , and it shows to what extent the node is punished by the IDS when it transits from the wellbehaved state to the misbehaved state; that is, the higher the RTP is, the slower the negative histories are forgotten.

Performance of Adaptive Discount Factor Strategy.
We evaluate the performance of the constant and adaptive discount factor strategies in terms of detection time.To do so, we implement three behavioral models.
(i) Deterministic redemption model: in this model, a node with reputation  = 0 behaves correctly in the network.
(ii) Deterministic evasion model: in this model, a node with reputation  = 1 behaves maliciously in the network.
(iii) Probabilistic evasion model: the node's behavior is modeled with a two-state Markov chain as depicted in Figure 8.In state , the node is well-behaved, and, in state , the node is misbehaved.Initially, the node's reputation  = 1.The node transits towards state  with probability   and towards state  with probability   , such that   +   = 1.  is called the evasion probability.The time spent in state  and state  is the monitoring time period.
The parameters for the experiment are shown in Table 2.We define three settings for the adaptive fading reputation.
(i) Setting 1: PTR and RTP are high; for example, they equal 0.7.(ii) Setting 2: PTR and RTP are medium; for example, they equal 0.3.
(iii) Setting 3: PTR and RTP are low; for example, they equal 0.1.
As for constant fading reputation, we define three levels of discount factor  = 0.2, 0.5, 0.8.
We study the evolution of reputation over time when applying constant and adaptive discount factor.In Figure 9(a), the convergence time increases as  increases.This is because higher (resp., lower) values of  mean that the negative histories are forgotten at slower (resp., faster) rate, which leads to longer (resp., shorter) time to converge to  = 1.In Figure 9(b), we observe that the deterministic redemption model under adaptive discount factor strategy requires less converge time than the constant one.It ranges between 3 and 9 observations under setting 1 and setting 3, respectively.The reason for this is that a node under setting 1 is rewarded more generously as long as it is well-behaving; that is, its positive histories are forgotten slower than those of setting 2 and setting 3.
In Figure 10, we also notice that the malicious node that follows the deterministic evasion is detected more quickly when the adaptive discount factor strategy is applied.The time to converge to  = 0 is between 3 and 9 observations under the adaptive discount factor strategy and between 4 and 14 observations under the constant discount factor strategy.For instance, let  = 0.1 be the boundary between malicious behavior and normal behavior; the malicious node can evade IDS detection for a time required to collect only 3 observations if the IDS adopts the adaptive discount factor strategy under setting 3.Under the constant discount factor strategy and if  = 0.8, IDS can detect the malicious after a time period of 5 observations.By knowing the required number of observations to detect a malicious node, the latter can adopt the probabilistic evasion model, which do discontinuous harm to the network to confuse the IDS and hence evade detection.Figures 11,12,and 13 show that the adaptive discount factor strategy can quickly detect this type of behavior.In the figures, we consider that a node is malicious when  = 0.1.When the evasion probability   = 0.5, the adaptive strategy succeeds at detecting the malicious node after a time between 2 and 37 observations.On the other hand, the malicious node can evade the IDS adopting the constant strategy for a time of 751 observations when  = 0.8.This value decreases to 10 and 2 when  = 0.5 and  = 0.2, respectively.When   = 0.6, the detection time decreases to 40 and 27 under  = 0.8 and setting 3, respectively.When   is between 0.7 and 0.9, the adaptive strategy (resp., constant strategy) achieves a detection time between 2 and 4 (resp., between 2 and 5) observations.

Spatiotemporal Dynamicity
A monitoring node  can make at least one observation about a monitored node  if the wireless link lasts for a duration higher than the monitoring period Δ.The malicious node , which knows this fact, can move around in the network to create links with its neighbors of duration less than Δ.
As shown in Figure 14, the nodes start operating at time  0 .A wireless link between the monitoring node  and monitored node  is created at time  1 when node  comes within the transmission range of node .Node  loses its link with node  either (1) when node  moves out of the transmission range of node  at time  2 or (2) when node  runs out of its battery power at time  3 .Therefore, node  estimates the link-node lifetime by the following equation: min( 2 − 1 ,  3 − 1 ).( 2 − 1 ) is the estimation of the link lifetime and ( 3 − 1 ) is the residual node lifetime after node  has been in existence for ( 1 −  0 ) time units.
In this section, we statistically analyze the link-node distribution.Based on this analysis, we choose appropriate values for the monitoring period so that the mobile monitored node cannot evade IDS detection.We use the random waypoint mobility model, in which each mobile node randomly selects a location, within an area of 100 m × 100 m, with a random speed uniformly distributed between 0 and a certain maximum speed  max ; then it stays stationary during a pause time of 1 second before moving to a new random location.In our analysis, we consider two different numbers of nodes (NN), that is, 10 and 20 nodes.
4.1.Link Lifetime Distribution.We obtain from our simulation the frequency of link durations and plot them into a histogram, as shown in Figures 15 and 16.The EasyFit software [21,22] is used to measure the compatibility of a random sample with the theoretical probability distribution functions.As shown in the figures, the software approximates the simulation data to a Weibull distribution [23] with two parameters  = 1.031 and  = 28.74(resp.,  = 1.029 and  = 32.85)when  max = 20 and NN = 10 (resp., NN = 20).
Weibull distribution has a PDF as shown in the following equation: Based on the properties of the Weibull distribution, the mean (expected value) is   On the other hand, Samar and Wicker [24,25] have described the expected link lifetime as a function of node velocity, say V 1 , with the following equation:  where  is the radius of the circle centered at the node.V 1 is uniformly distributed between  and , expressed in meters/second. is the direction of motion:  0 =  − sin −1 (/V 1 ).
Since ( 12) and ( 13) are both describing the expected value of the link lifetime, we can write We derive then  as a function of velocity V 1 as follows: Simulations have been conducted to compare between the theoretical  obtained from (15) and the Weibull approximative one obtained from simulations, as shown in Table 3.The results show that the Weibull distribution fits well simulation data.

Residual Node Lifetime Distribution.
We assume that the node lifetime follows an exponential distribution with a parameter .This distribution is similar to the one used to model "time to failure" in reliability engineering.We consider that  is the rate at which node's battery is discharged.The probability density function is then The probability density function of the residual node lifetime for a node of age  is given by the following equation [26]: where  is the cumulative density function (CDF) of the exponential distribution.Thus, the residual node lifetime also follows an exponential distribution.The expected value for the random variable  following an exponential distribution is Consequently, the cumulative distribution function (CDF) of  is Thus, The approximated density function for the combined variables  and  is a Phased Bi-Weibull distribution [27], which has a PDF as shown in EasyFit software [22] approximates the simulation data to the Phased Bi-Weibull distribution, as shown in Figure 17 (resp., Figure 18), with parameters  1 = 0.87118,  1 = 19.482, 1 = 0,  2 = 0.68969,  2 = 31.875,and  2 = 3 (resp.,  1 = 0.90481,  1 = 22.976,  1 = 0,  2 = 0.71509,  2 = 14.819, and  2 = 4).Remark 2 (see [28]).For real values ,  ∈ R, min(, ) =  +  − max(, ).
The result of this remark is extended to random variables by the following theorem.
Lemma 4 (see [28]).Given two real-valued continuous random variables X, Y ∈ Ω → R, then the expected value of the maximum of the two variables is Based on Theorem 3 and Lemma 4, the expected linknode lifetime is given by International Journal of Distributed Sensor Networks  where E() is given in (12) and E() in (18).Figure 19 shows that the expected link-node lifetime resulted from simulation as a function of node velocity.The results show that the expected link-node lifetime decreases rapidly as its velocity is increased and it shows a significant decrease when  max ∈ [1,5].The results also show that under higher network density the expected link-node lifetime becomes longer.The reason for this is that a node in this case shares links with larger number of neighbors, and consequently links with longer durations will be observed.

Monitoring Period Estimation.
Based on the above statistical analysis, we propose a method to choose the appropriate value for the monitoring period.This method is low-cost and more appropriate for resource-constrained networks like sensor networks.We also propose another method that requires some communication cost and can be implemented on nodes with higher capabilities such as mobile sinks or mobile ad hoc networks and vehicular ad hoc networks.

Low-Cost Method.
We assume that the monitoring node has no information about the monitored node's velocity, position, or residual battery, and it wants to ensure that % of its links are observable; that is, they exist for a duration > Δ.As the link-node lifetime follows a Phased Bi-Weibull distribution, the minimum value of Δ, which ensures this requirement, is  such that ( ≤ ) = /100.

High-Cost Method.
We assume that each node  can estimate its remaining battery power   and its rate of energy dissipation EDisip  for every time period Δ; an ultraconservative estimate of the residual node lifetime is derived as shown in the following equation: Each node  periodically broadcasts a beacon message containing its residual node lifetime   and its position obtained from GPS. Upon receiving such a message from node , node  first calculates   , that is, the distance separating it from its neighbor .The relative velocity of node  with respect to node  is √V 2  + V 2  − 2V  V  cos , where V  and V  are node 's and node 's velocity, respectively. denotes the angle between vectors  → V  and  → V  in the Cartesian coordinate system.The relative velocity is maximum when V  = V  =  max and  = 180 ∘ , and it equals then to 2 max .Node  then calculates a conservative estimate of the residual link lifetime, that is, the minimum time for node  to move out of the transmission range of node .The residual link lifetime,   , is given by the following equation, where TR is the transmission range: After that, each node  estimates the residual link-node lifetime given by   = min (  ,   ) .
Therefore, the monitoring period required to observe the monitored node  must be less than   .

Conclusion
In this paper, we have proposed IDS solutions for three aspects of dynamicity in ad hoc and wireless sensor networks.The magnitude dynamicity aspect is solved by defining a normal profile based on the invariants derived from the normal node behavior.We have generated a dependency graph consisting of strongly correlated features, and we have derived the high-level features from the graph.The high-level features are obtained by applying the divide-and-conquer strategy on the maximal cliques algorithm and the maximum weighted spanning tree algorithm.Simulation results show that the IDS can achieve a detection rate of 100% when the malicious behavior is not similar to the normal one.In addition, it can also achieve a false positive rate of 0% when the duration of the training time exceeds a given value.To handle nature dynamicity aspect, we have adopted the carrot and stick strategy to prevent a malicious node from evading the IDS.To do so, we have proposed an adaptive reputation fading strategy to allow fast redemption and fast capture of malicious node.We have analytically studied link-node lifetime distribution and have shown that it can be approximated to the Phased Bi-Weibull distribution.Based on this analysis, we have proposed a low-cost method to estimate the minimum monitoring period required to observe the monitored node's behavior.In addition, based on some topology information, we have proposed a high-cost method designed for network having nodes less constrained with resource limitations.

2 InternationalFigure 1 :
Figure 1: Impact of feature choice on false positive rate.

Figure 3 :Figure 4 :
Figure 3: Normal profile and minimum normal profile.

Figure 7 :
Figure 7: Positive and negative discount factors.

( 9 )
() and () denote the discount factors for past positive and negative histories, respectively, whose values fall into the range of [0, 1].According to the value of , a reputation system executes the following two fading strategies.(i)Reward Strategy.It is applied when the reputation  ≥ th, such that th ∈ [0, 1].The IDS forgets the negative history more quickly than the positive one (i.e., () > ()); this strategy is used when a node is well-behaved.

Table 1 :
Relation between attacks and features.
return  is normal; 6 , and  7 .The correlation coefficient matrices Ω between these features are The adaptive mechanism uses two types of discount factors: one for past positive observations International Journal of Distributed Sensor Networks

Table 3 :
Comparison between theoretical and approximative .