Robust Monitor Assignment with Minimum Cost for Sensor Network Tomography

In wired networks, monitor-based network tomography has been proved to be an effective technology for network internal state measurements. Existing wired network tomography approaches assume that the network topology is relatively static. However, the network topology of sensor networks is usually changing over time due to wireless dynamics. In this paper, we study the problem to assign a number of sensor nodes as monitors in large scale sensor networks, so that the end-to-end measurements among monitors can be used to identify hop-by-hop link metrics. We propose RoMA, a Robust Monitor Assignment, algorithm to assign monitors in large scale sensor networks with dynamically changing topology. RoMA includes two components, confidence-based robust topology generation and cost-minimized monitor assignment. We implement RoMA and evaluate its performance based on a deployed large scale sensor network. Results show that RoMA achieves high identifiability with dynamically changing topology and is able to assign monitors with minimum cost.


Introduction
Network tomography techniques [1] use end-to-end measurements to calculate hop-by-hop link metrics, such as delay and packet reception ratio.Recent advances in network tomography techniques [2][3][4] show that cycle-free measurement paths among monitors can be used to form a linear system on the internal unknown link metrics.Then these unknown link metrics can be calculated by solving the linear system.In order to successfully solve the linear system, sufficient linearly independent measurement paths should be able to be conducted among the monitors, requiring the monitors assignment to comply with certain conditions.
Existing works assume relatively static network topologies and uniform cost of assigning monitors, since they all focus on wired communication networks.Then these works focus on calculating the minimum monitor assignment to enable sufficient linearly independent measurement paths.In wireless sensor networks (WSNs), however, the network topologies keep changing over time [5,6], due to the wireless dynamics.Further, assigning a monitoring node to different nodes in a deployed network usually requires different cost, depending on the environmental conditions, the location of each node, and so forth.Therefore, existing techniques cannot be applied into WSNs directly.
In this paper, we focus on calculating a minimum cost monitor assignment in a WSN, given the dynamic changing network topology.In particular, we propose RoMA, a Robust Monitor Assignment, algorithm to assign monitors with minimum cost in large scale sensor networks with dynamically changing topology.RoMA includes two components, confidence-based robust topology generation and cost-minimized monitor assignment.The confidencebased robust topology generation merges multiple historical topologies based on a confidence value.It generates a robust topology, which captures the dynamic changing topology over time.Based on this robust topology, the cost-minimized monitor assignment component of RoMA assigns monitors with minimum overall cost.
We implement RoMA and evaluate its performance through extensive simulations based on a deployed large scale sensor network.Compared with MMP [2], RoMA assigns fewer monitors with high link metric identifiability and achieves a much smaller overall cost.

International Journal of Distributed Sensor Networks
The rest of the paper is organized as follows.Section 2 describes the related work of RoMA.Section 3 formulates the problem.Section 4 describes the design of RoMA.Section 5 presents evaluation results.Finally, Section 6 concludes the paper.

Related Work
Based on the model of link metrics, existing work on measuring the network internal state can be broadly classified as hopby-hop and end-to-end approaches.Hop-by-hop approaches use diagnostic tools such as traceroute, pathchar [7], and Network Characterization Service (NCS) [8] to measure hopby-hop link metrics directly.By sending multiple probes with different time-to-live fields, traceroute can measure the delay of each hop on the probed path.Pathchar uses a similar approach to measure hop-by-hop delays, capacities, and loss rates.NCS also reports available capacities of each hop.
End-to-end approaches use end-to-end metrics to calculate internal link metrics.They assume the network is controllable; otherwise, the minimum monitor assignment problem has been proved to be NP-hard [9][10][11].The basic idea is to build a linear system from the path measurements and use linear algebraic techniques to calculate the unknown link metrics [12,13].When cyclic measurement paths are allowed, [14] gives the necessary and sufficient conditions on the network topology.Since routing along cycles is typically prohibited in real networks, [2] gives the necessary and sufficient conditions on the network topology when only cycle-free measurement paths are used.
However, in WSNs, the network topologies keep changing over time and existing techniques cannot be applied into WSNs directly.In this paper, we focus on calculating a minimum cost monitor assignment in a WSN, given the dynamic changing network topology, to calculate all link metrics.We propose RoMA, a Robust Monitor Assignment, algorithm to assign monitors with minimum cost in large scale sensor networks with dynamically changing topology.

Problem Formulation
We model the network topology as an undirected graph  = (, ), where  is the set of nodes and  is the set of links.Each link   ∈  is associated with an unknown metric    .We assume that link metrics are symmetric in both directions.We also assume that the link metric    does not change during the measurement period.Taking delay as an example, [15] shows that the delays of the same link within a relatively short period of time are similar.Monitors are certain nodes in  which can initiate/collect cycle-free measurements.They can control the routing of measurement packets.
Let  = (  1 , . . .,    )  denote the column vector of all link metrics and  = (  1 , . . .,    )  the column vector of all available path measurements, where  and  are the number of links and measurement paths, respectively, and    is the sum of metrics along measurement path   .Then we can get a linear system as follows: where  = (  ) is a  ×  matrix, with each   ∈ {0, 1} means whether link  is on path .A link is identifiable if we can solve its metric from the above linear system.If and only if rank() = , the network  is completely identifiable.If rank() < , it may still be possible to identify some of the link metrics.
We want to assign a number of nodes in the network as monitors to initial/collect measurement packets.In the current problem formulation, we assume that all link metrics are unknown before the network tomography.After assigning monitors, we can use the algorithm STPC [3] to find a set of linearly independent paths between monitors efficiently.Each of these paths represents a row of  and the sum of metrics along the path is an element of .So we can solve the unknown link metrics by solving for  given  and .Since assigning a node as a monitor usually needs nonnegligible operational cost (e.g., hardware/software, human efforts), we focus on assigning monitors with minimum cost to identify most of the links.

Design
The design of RoMA includes two components: confidencebased robust topology generation and cost-minimized monitor assignment.The confidence-based robust topology generation algorithm uses instant topologies to generate a robust topology.The cost-minimized monitor assignment algorithm provides a subset of nodes in the robust graph as monitors with the minimized cost.The set of monitors can identify all links in the robust graph and the majority of links in future topologies.

Confidence-Based Robust Topology Generation.
Due to wireless dynamics and interference, a node usually transmits its packets to different receivers at different time.Therefore, RoMA first generates a robust topology of a WSN for monitor assignment.The input of RoMA is a number of packets received by sink.In each packet , there are three data fields related to RoMA, which are the origin (), the parent (), and global packet generation time ().Origin () and parent () are the first two hops of 's routing path.The global packet generation time can be obtained by packet timestamping technique without global time synchronization [16].By using each packet's origin and parent, we can construct the topology of the WSN.Let   denote an instant topology constructed by a set of packets sent by nodes to their parents in a period .With a set of packets having different sending time, a number of instant topologies can be constructed.Then we use a set of instant topologies { 1 , . . .,   , . ..} to generate a robust topology.As described in Algorithm 1, the inputs are a set of packets  = { 1 , . . .,   } received by sink, period , and confidence  min .These packets have different sending time so that we can get a set of instant topologies G = {  1 , . . .,    } (line 1).Let  be a set, which contains all instant topologies' links in G (line 2).For each link  in set , we compute link 's confidence   and compare it with the minimum confidence  min .|G| denote the number of instant topologies in G and   the number of instant if   ≥  min and  is not in   then (7) select  as an link in   return   Algorithm 1: Confidence-based robust topology generation.topologies in G which contain link .If   is larger than  min , we select  as a link in the robust topology   (lines 4-7).

Cost-Minimized Monitor Assignment.
Then RoMA assigns monitors with minimum cost in the robust topology   .Before describing the algorithm, we first introduce several graph theory concepts.
(i) A graph is connected if there is a path from any point to any other point in the graph.
(ii) A k-connected component of  is a maximal subgraph of  that is either (i) -vertex-connected or (ii) a complete graph with up to  vertices.The case of  = 2 is also called a biconnected component and  = 3 a triconnected component.
(iii) A cut-vertex is a vertex whose removal will disconnect the graph.
(iv) A 2-vertex cut is a set of two vertices {V1, V2} such that removing V1 or V2 alone does not disconnect , but removing both disconnects .Each vertex of {V1, V2} is a 2-cut-V.
(v) Nodes that are cut-vertices or part of 2-vertex cuts are called separation vertices.Figure 1 shows an example which illustrates the above concepts.In this example, the whole graph is a connected graph.It contains two biconnected components, which are separated by a cut-vertex.There is also a triconnected component shown in the figure, which is connected to the graph by a 2-vertex cut.
If all vertices are assigned as monitors, it is obvious that all links are identifiable.However, assigning a node as a monitor usually needs nonnegligible operational cost (e.g., hardware/software, human efforts); RoMA tries to assign monitors with minimum cost to identify most of the link metrics.A recent work MMP [2] assigns the minimum number of monitors to identify all links in a connected graph.It is actually a special case when all vertices have the same cost to be assigned as monitors.Different with MMP, RoMA calculates a subset of nodes in the robust graph as monitors with the minimum cost.
Ma et al. [2] show that there are 4 rules which must be satisfied to identify a topology with the minimum number of monitors.
(i) A node whose degree is one must be a monitor.
(ii) A node on a tandem of links (degree is two) must be a monitor.
(iii) For a subgraph with two cut-vertices or a 2-vertex cut, at least one node other than those cuts must be a monitor.
(iv) Similarly, for a subgraph with one cut-vertex, at least two nodes other than the cut-vertex must be monitors.
As shown in Algorithm 2, the cost-minimized monitor assignment method follows rules (i) and (ii) to select all vertices with degree less than three as monitors (line 1).Then it partitions the graph into a number of biconnected components.For each biconnected component, it further partitions the biconnected component into a number of triconnected components.Note that there are efficient algorithms to accomplish the above biconnected components and triconnected components partitioning [17,18].
For each triconnected and then biconnected component that contains three or more nodes, the cost-minimized monitor assignment makes sure that (i) each triconnected component has at least three nodes that are either separation vertices or monitors with the minimum cost in the component (lines 5-7) and (ii) each biconnected component has at least three or more nodes that are either cut-vertices or monitors with the minimum cost in the component (lines 8-9).Finally, Algorithm 2 selects additional monitors with the minimum cost as needed to ensure that the total number of monitors is at least three (lines 10-11).As described in Algorithm 2, for a component , let   denote the number of separation vertices,   the number of cut-vertices, and   the number of (already selected) monitors in .
The cost-minimized monitor assignment component's output is a set of monitors which can be used in algorithm STPC [3] to gain .As mentioned above, the topology is completely identifiable if and only if the rank of  is .The matrix  represents a set of measurement paths.Therefore, the rank of  is not directly related to the topology generation process but is directly related to the monitor assignment and path selection process.However, the network topology does have impact on the monitor assignment and the path selection.For example, if the original graph is a triconnected graph, we only need to assign three monitors to identify all links.

Evaluation
In this section, we evaluate the performance of RoMA through a set of simulations based on a deployed large scale sensor network, the CitySee project.

Evaluation Setup.
CitySee is deployed in an urban area to collect multidimensional sensing data such as carbon emission, temperature, and humidity.All nodes in CitySee are organized as four subnets.Each subnet has one sink and these four sink nodes transmit data packets to a base station through 802.11 wireless links.And each node in the network transmits 4 data packets back to the sink node every 10 minutes.We use the trace from one subnet to evaluate the performance of RoMA.The main performance metrics are the number of monitors and the identified ratio of links.
We construct a set of instant topologies using period  and merge  topologies into a robust one with different confidence  min .Then we assign monitors with minimum cost in the robust topology and use these monitors to identify a future topology.
In the simulations, we study the impacts of different parameters to the performance of RoMA.There are several parameters such as confidence  min , period , and the number of merged topologies .When changing one parameter, we keep the other parameters as constant.

5.2.
Simulation.First we study the impact of period .While  = 1 and  min = 1, we set the period  hour, semidaily, and daily, respectively.From Table 1 we can see that, with different temporal resolutions (hour, semidaily, and daily), there is a very drastic change in the number of monitors.That is because, with a high temporal resolution, the topology is sparse and needs more monitors to identify it.As shown in Table 1, when  = hour, we get 168 monitors with the 96.6% of links which can be identified.However, if we set  = daily, with only 46 monitors, the percentage of links which can be identified reaches up to 89.6%.In order to get a set of monitors with a reasonable amount, we choose period  = daily.

Node Monitor
Total cost = 26957  Then we study the impact of the number of merged topologies .Setting  = daily, we merge different numbers of topologies into a robust one with  min = 0.6 and 0.8, respectively. has an impact on the structure of the robust topology.But the impact is not linear.That is to say, a larger  does not mean a better performance.On the other hand, to identify more links, we need more monitors, which results in a higher cost.Considering the cost, it is unreasonable to assign a lot of monitors.Table 3 shows the impact of .While  = 5, the number of monitors and the identifiability are all acceptable.And there is not a big change in the identifiability when  increase.To achieve a balance between the number of monitors and the percentage of identified links, we set  five while  = daily.
The confidence has a positive influence on the number of monitors and links which can be identified.We merge 5 instant topologies into a robust one with  min = 0.4, 0.6, 0.8, 1, respectively, and show the result in Table 2. Then we compare the results with MMP in Figure 2. It is easy to see that high confidence leads to a sparse topology which needs more monitors to be identified.As shown in Figure 2,  min = 0.8 achieves a better performance than others with a high identifiability and a reasonable number of monitors.
From the above, we set  = 5,  min = 0.8, and  = daily and compare the performances of RoMA and MMP.Assuming that each node has a steady cost, we use two sets of monitors which are got from RoMA and MMP, respectively, to identify a future topology and show the results in Figure 3.The points marked as blue are monitors with the different costs denoted by points' size.These red edges are the links which cannot be identified using those monitors.Figure 3(a) shows RoMA's results and Figure 3(b) MMP's results.Different with MMP, RoMA calculates a subset of nodes in the robust graph as monitors with the minimum cost.Further, RoMA merges multiple topologies to obtain a robust topology, reducing the monitors assigned.Therefore, as shown in Figure 3, the number of monitors got from RoMA is less than MMP's, and the cost of the monitors is not larger than MMP's.Also, we evaluate the performance of RoMA using some other real network topologies.We use the Internet Service Provider (ISP) topologies from the Rocketfuel [19] project, which represent physical connections between backbone/gateway routers of several major ISPs around the globe.We obtain the cost by randomly generated numbers between 1 and 1000.The network topologies are relatively static so that merging topologies into a robust one is not necessary.We use RoMA to assign monitors with parameters  = daily,  = 1, and  min = 1 and then use these monitors to identify other topologies.Results are shown in Figure 4.The -axis is the topologies got from different days, and the axis is the percentage of identified links.The identifiability of each topology shown in the figure is higher than 96%,

Figure 1 :
Figure 1: An example that illustrates several graph theory concepts.

Figure 4 :
Figure 4: The performance of RoMA using ISP topologies.

Table 1 :
Impact of temporal resolutions.

Table 2 :
Impact of different confidence.