Data Gathering Techniques for Wireless Sensor Networks: A Comparison

We study the problem of data gathering in wireless sensor networks and compare several approaches belonging to different research fields; in particular, signal processing, compressive sensing, information theory, and networking related data gathering techniques are investigated. Specifically, we derived a simple analytical model able to predict the energy efficiency and reliability of different data gathering techniques. Moreover, we carry out simulations to validate our model and to compare the effectiveness of the above schemes by systematically sampling the parameter space (i.e., number of nodes, transmission range, and sparsity). Our simulation and analytical results show that there is no best data gathering technique for all possible applications and that the trade-off between energy consumptions and reliability could drive the choice of the data gathering technique to be used. In this context, our model could be a useful tool.


Introduction
Wireless sensor networks (WSNs) are composed of a lot of tiny, low power, and cheap wireless sensors, deployed in a geographic area to perform distributed tasks, for example, to monitor a physical phenomenon [1]. In 2004, MIT Technology Review ranked WSNs as the number one emerging technology [2] and today they are effectively employed for many applications, such as surveillance (e.g., real-time area audio or video surveillance), security (e.g., detection of biological agents or toxic chemicals), habit monitoring (e.g., environmental measurement of temperature, pressure, or mechanical vibration), home automation, military systems, and, in general, scientific experiments.
In a typical WSN topology, we can distinguish between ordinary wireless sensor nodes and base stations named sinks. The sink is usually connected to a power supply and it is capable of performing more complex operations than the ordinary nodes. Ordinary wireless sensor nodes, which are capable of transferring processed or raw sensed data to the sink, due to economical reasons, are instead usually powered by small size batteries that in most application scenarios are difficult or even impossible to replace or recharge.
So, in contrast to many other wireless devices (e.g., cellular phones, PDAs, and laptops), usually it is not expected to renew the energy supplied to a wireless sensor node during the life of the WSN. For this reason, each sensor node is required to work under very low power consumption conditions.
In general, to design a highly energy-efficient WSN, it is extraordinarily important to take into account capture, transmission, and routing issues, that is, data gathering techniques that specify how ordinary sensors work for gathering information and delivering them to the sink. As a consequence, data gathering is the main and more critical function provided by a WSN.
The main aim of this paper is to compare some of the state-of-the-art data gathering techniques considering their trade-off between reliability (i.e., packet loss and reconstruction error) and energy consumptions (i.e., network lifetime) by taking into account both compression and networking aspects. To the best of our knowledge, this is the first paper that considers such type of comparison for data gathering techniques belonging to different research fields (i.e., signal processing, compressive sensing, information theory, and networking related techniques are discussed and compared 2 International Journal of Distributed Sensor Networks in this paper). Specifically, we derived a simple analytical model able to predict the energy efficiency and reliability of several data gathering techniques. Moreover, we carry out simulations to validate our model and to compare the effectiveness of the above schemes by systematically sampling the parameter space (i.e., number of nodes, transmission range, and sparsity).
The rest of the paper is organized as follows. In Section 2, we present a summary of related works. In Section 3, further details about existing data gathering techniques are provided by highlighting their advantages and drawbacks. In Section 4, the simulation scenario used for comparisons is detailed and an analytical model able to predict the energy efficiency and reliability of different data gathering techniques is derived. In Section 5, the metrics used to compare data gathering techniques are introduced. In Section 6, simulation results are provided and the developed analytical model is validated. Finally, in Section 7, some concluding remarks and future works are drawn.
For the sake of clarity, symbols and notations used throughout the paper are reported in Notations section.

Related Works
In the past few years, several data gathering techniques have been proposed for WSNs with the main aim of reducing energy consumptions in WSNs by exploiting correlations among sensory data. We can distinguish them into two broad categories: compression-oriented and networking-oriented.
Further details about compression-oriented data gathering techniques will be provided in Section 3. In particular, signal processing, compressive sensing, and information theory related techniques are discussed, respectively, in Sections 3.1, 3.2, and 3.3.
Since radio transmission is the primary source of power consumption in WSNs, a second category of data gathering techniques, named networking-oriented, have dealt with the problem of maximizing network lifetime by taking into account network protocols and, more specifically, forwarding/routing mechanisms [14][15][16][17].
In particular, in [14], the authors show how it is possible to maximize the lifetime of a WSN by exploiting routing algorithms. In [15], the authors study the problem of forest construction for maximizing the network lifetime and adopt a simple data aggregation model where an intermediate sensor can aggregate multiple incoming messages into a single outgoing message. A different approach is proposed in [16] where a smart splitting technique is used in order to achieve different trade-offs between reliability and energy saving. Finally, in [17], the authors address the problem of maximizing network lifetime by taking into account also latency and reliability.
However, with the exception of [15], the above papers do not perform any kind of data aggregation with the aim of not introducing extra delay.
In particular in [18], the problem of jointly optimized routing and data aggregation is investigated; in [19], the authors combine data compression and multipath routing techniques to obtain a reliable and low-latency data aggregation scheme; in [20], an energy-balanced data gathering and aggregating scheme is proposed which integrates a clustering hierarchical structure with the compressive sensing to optimize and balance the amount of data transmitted; finally in [21] a data gathering technique based on the network coding paradigm is proposed.
Further details about the above works will be provided in Section 3.4.
Several other papers exist which compare signal processing techniques in WSNs from energy efficiency and network lifetime perspectives and several works highlight the effect of using different routing mechanisms for data gathering.
Nevertheless, techniques belonging to different research fields such as compressive sensing, information theory, and networking are seldom evaluated against one another and this is the main goal of this paper.
Only recently, such comparisons have started, for instance, [22] where lossy data aggregation techniques are evaluated and compared in terms of reconstruction errors and energy consumptions. However, authors do not consider the impact of network reliability (i.e., packet loss) nor compare networking-based data gathering techniques such as those based on the network coding paradigm.
The aim of this paper is to fill this gap by investigating the effectiveness of all the above data gathering techniques also in terms of reliability. In particular, an analytical model able to predict the energy efficiency and reliability of different data gathering techniques is derived.

Data Gathering Techniques
We can classify data gathering schemes on the basis of the research field from which the technique used to exploit correlation among sensor nodes is drawn, that is, (i) signal processing; (ii) compressive sensing; (iii) information theory; (iv) networking.
Techniques belonging to the above fields are discussed in the next subsections by highlighting their advantages and drawbacks.

Signal Processing Techniques.
Frequently high correlations (spatial and/or temporal) among sensor readings exist. In this case, it is inefficient to deliver the entire raw data to the destination [8,9] and signal processing, in particular Transforms and Encoding Compression (TEC) techniques, can be exploited in order to reduce the amount of data to send.
In the case of local TEC techniques, node collects measurements following the Shannon-Nyquist sampling theorem; these measurements are transformed and properly encoded and the output of such transformation is stored in the payload of one or more packets and sent to the sink. In particular, either lossy or lossless techniques can be used depending on the particular application scenario.
With lossy techniques [3], the original data is compressed discarding some of the original information; this allows achieving higher compression ratios but at the receiver side one can only reconstruct the data with a certain accuracy.
However, in some types of monitoring, the accuracy of observations is critical for understanding the underlying physical processes. In other cases, it is not possible to have an a priori knowledge about the magnitude of observational errors that are tolerable without affecting a correct data gathering. Moreover, some application domains (e.g., body area networks BANs in which sensor nodes permanently monitor and log vital signs) demand sensors with high accuracy and cannot tolerate measurements corrupted by lossy compression processes.
In all these kinds of WSNs, lossless data gathering is essential and desirable. Examples of local lossless compression schemes have been proposed in [4,23,24].
Lossy compression techniques have been evaluated and compared in terms of reconstruction errors and energy consumptions in [22]; therefore, in this paper we concentrate our attention on lossless techniques.
For the sake of space, we did not consider distributed TEC techniques in this paper but we refer the reader to [13]. Nevertheless, the major distributed approaches of signal processing applied to WSNs will be discussed in the next subsections. [11] and Donoho [12] used to capture and to compress signals in WSNs where compression and sampling are merged and carried out at the same time. Basically, CS compresses a signal while acquiring data at its information rate (without relying to the Shannon-Nyquist sampling theorem). CS theory states that if a signal is sparse or compressible in a certain basis, then it can be reconstructed from a small number of linear measurements by solving an 1 based convex optimization problem [7].

Compressive Sensing. Compressive sensing (CS) is a new paradigm introduced by Candes and Tao
More precisely, let us define -sparse signals = ( 1 , . . . , ) as signals that can be expressed as = Ψ , where Ψ is an orthonormal transform and is a vector with at most ≪ nonzero entries; CS theory states that can be recovered from = ( log( / )) linear combinations of measurements obtained as = Φ , where Φ is an × matrix.
Note that, considering that is a small value in comparison to , it follows that can be much smaller than and therefore high compression ratios can be achieved using CS (i.e., by transmitting CS measurements instead of raw data ). Reconstruction is achieved by solving a complex optimization problem of the following form: where = ΦΨ. Once the above problem is solved, can be recovered as = Ψ . Alternatively, when noised measures are considered, the following optimization problem can be considered: where bounds the noise. Several algorithms exist which are able to solve the above optimization problem (Basis Pursuit [25], OMP [26], and CoSaMP [27], to name just a few) and several theoretical results exist describing when these algorithms recovered sparse solutions. In particular, (i) as proved in [28], a signal can be recovered with high probability if Φ satisfies the Restricted Isometric Property (RIP).
Formally, a matrix Φ satisfies the RIP if for all -sparse signals exists a ∈ (0, 1) such that Example of matrices that satisfy the RIP condition are ±1 Bernoulli matrix and Gaussian distribution matrix where , ∼ (0, 2 Φ ) with 2 Φ = 1/ . (ii) As proved in [29], in the noise-free case, exact recovery with Gaussian matrix can be obtained if We will exploit the above results to derive a reliability model for CS.
CS can be applied in cluster-based WSNs considering that each sensor node in a cluster sends its reading to the cluster head which will multiply all received readings by random coefficients , by generating weighted sums = ∑ , with ∈ [1, . . . , ]. Next, values , named CS measurements, are sent to the sink through one or multiple packets.
On the basis of the CS theory, under sparsity condition, by collecting a sufficient number of CS measurements, the sink will be able to reconstruct the original sensor data .
CS can be applied also in tree-based WSNs considering that the source node includes in its packet the sensing information which is the product of its acquired value and a random coefficient and then sends it to its next hop node [30]. In such way, CS compression could be performed with a low complexity at source nodes [9] and data traffic over 4 International Journal of Distributed Sensor Networks the network is reduced [8]. However, in the last case a high number of hops are needed, that is, ℎ = ( / ), which may lead to high network latency.
In [7], comparisons of CS-based and conventional signal processing techniques for WSNs have been carried out in terms of energy efficiency and network lifetime.
However, there are several challenges that must be addressed in order to use CS: (i) Decoding time for reconstruction can be ( 3 ) and therefore prohibitively expensive for large networks. Less expensive algorithms exist (e.g., matching pursuit) but they provide less stable recovery and weaker error bounds in the recovered solution.
(ii) CS assumes that the sensed data has a known constant sparsity, ignoring that the sparsity of real signals varies in temporal and spatial domain. In particular, the sparsifying basis Ψ is assumed to be given and fixed with time, but this is not the case for a realistic WSN scenario, where the signal of interest is unknown and its statistical characteristics can vary over time.
(iii) CS-based techniques introduce not negligible losses (recovery errors) by reducing reliability and work best for large scale networks (at least a thousand nodes).
(iv) Quantization effects: CS theory has mostly focused on real-valued measurements but in practice measurements must be represented with a finite number of bits. As a consequence, a trade-off exists between the number of measurements and the number of bits per measurement CS .
In this paper, we concentrate on the last two problems, by analyzing the effect of sparsity and quantization on energy saving and reliability. Further details on CS and how to exploit it for WSNs will be given in the next sections.

Information Theory Related Techniques.
In order to exploit the correlation of data concurrently acquired by different sensors, DSC techniques, inspired by the Slepian-Wolf theorem, can be applied [2]. The DSC techniques imply that each sensor node sends its compressed outputs to the sink for joint decoding. This means that the nodes need to cooperate in groups of two or three so that one node provides the side information and another one can compress its information down to the Slepian-Wolf or the Wyner-Ziv limit. Furthermore, DSC approaches are also difficult to be applied in such scenarios since they work with the assumption that the statistical characteristics of the underlying data distribution should be known in advance [9,31].
The most practical and well-known implementation of DSC is DISCUS [5,32] where sensor nodes are considered divided into clusters. For each cluster, a node (the cluster head) sends uncompressed data (as side information) while all other nodes transmits encoded (i.e., compressed) data.
To encode data, a sensor node firstly divides all possible values into disjoint sets (named bins) so that values in the same bin have a minimum distance . Each piece of sensory data is then compressed with a code that identifies the unique bin where the sampled value lies.
To better explain how DISCUS works let us consider a simple example.
Let us suppose that (quantized) measurements are in the integer range [0, 7] and that all data sensed from different sensors at (almost) the same time differ by at most ±1. Without DSC compression, three bits are needed for each sensor to represent sensory data. Instead, in the case of DSC only the cluster head sends a three-bit value. The other sensor nodes can split the possible values into four bins . Now let us suppose that the sink receives also the value 6 (properly encoded) from the cluster head; in this case, the value 01 is immediately interpreted as 5 without ambiguity as a consequence of the fact that sensed data can differ by at most 1.
In the above example, only the cluster head transmits 3 bits for each measurement while for all the other nodes 2 bits are enough; therefore, compression is achieved.
However, DSC relies on the assumption that statistical characteristics (i.e., correlation function) of the underlying data should be known a priori, which is difficult to obtain in practical scenario. For instance, the simple DISCUS scheme discussed above works only if the difference between the value sampled by the cluster head and all the other nodes in the same cluster is less than /2. Moreover, losing side information (i.e., cluster head data) will cause fatal errors to the decoder, that is, low reliability.
A simple manner to improve reliability is achieved by retransmitting cluster head packets more times but this reduces compression efficiency. Therefore, a trade-off exists between energy consumptions and reliability on the basis of the maximum allowed number of retransmissions. In this paper, we investigate such a trade-off.

Routing-Based Techniques.
Since radio transmission is the primary source of power consumption at the nodes, the design of energy-efficient routing is another important topic to investigate in the design of data gathering technique. The basic idea is to route the packet through the paths so as to minimize the overall energy consumption for delivering the packet from the source to the destination. The problem focuses on computing the flow and transmission power to maximize the lifetime of the network, which is the time at which the first node in the network runs out of energy [18]. Specifically, the energy consumption rate per unit of information transmission for each node depends on the choice of the next hop, that is, the routing decision. This choice can influence the energy required to reach the sink [14].
One of the most recent works which addresses the problem of maximizing network lifetime taking into account the routing mechanism is [17]. The authors try to achieve both International Journal of Distributed Sensor Networks 5 low latency and high reliability. They construct a data gathering tree based on a reliability model, schedule data transmissions for the links on the tree, and assign transmitting power to each link accordingly. However, they do not perform any kind of data aggregation or data compression with the aim of not introducing extra delay.
Data aggregation can be performed on top of the routing algorithm. The aggregation function is usually performed by extracting some statistical values (e.g., maximum, minimum, and average) and then by transmitting only these [15]. In such a way, it is possible to reduce the amount of communicating data in the dense sensor networks and reduce the power consumption. However, this technique loses much of the structure of the original acquired data.
In particular, the authors of [15] study the problem of forest construction for maximizing the network lifetime. They adopt a simple data aggregation model and assume that an intermediate sensor can aggregate multiple incoming Bbit messages, together with its own message, into a single outgoing message. Moreover, they provide a polynomial time algorithm to build the tree and demonstrate that it is close to optimal.
In [14,33,34], the author considered the problem of maximizing the lifetime of WSNs by routing algorithms by recasting this problem as a linear programming problem solvable in polynomial time. The proposed algorithm is a shortest cost path routing whose link cost is a combination of transmission and reception energy consumption and the residual energy levels at the two end nodes.
In [18], the authors try to jointly optimize routing and data aggregation so that the network lifetime can be extended considering two dimensions. In the first dimension, the traffic across the network is reduced by data aggregation, so that one can reduce the power consumption of the nodes close to the sink node. In the second dimension, the traffic is balanced to avoid overwhelming the bottleneck nodes. A smoothing function is used to approximate an original maximization function by exploiting the special structure of the network. The necessary and sufficient conditions for achieving the optimality of this smoothing function were derived and a distributed gradient algorithm was accordingly designed.
Yang et al. propose in [35][36][37] a joint design of energy replenishment and data gathering by exploiting mobility. The SenCar, a multifunctional mobile entity, periodically chooses a subset of sensors to visit based on their energy status. It utilizes wireless energy transmissions to deliver energy to the visited sensors and, meanwhile, it collects data from nearby sensors via short-range multihop communications and can convey this data to the sink.

Network Coding Based Techniques.
A different approach exploits the network coding (NC) paradigm.
NC is an effective information transmission approach originally introduced by Ahlswede et al. [38] to improve network capacity of multicast networks.
Differently from the classical store and forward network paradigm where nodes simply replicate and forward incoming packets, using NC intermediate nodes in the network have the ability to forward functions of received packets (e.g., linear combinations). In this manner, throughput gain, robustness, and energy saving can be achieved by exploiting the fact that each newly generated packet carries information contained in several original packets [39].
NC has received increasing attention also in WSNs as a promising tool to improve network lifetime and reliability by exploiting the broadcast nature of the wireless channel [40].
However, most of the proposed techniques developed so far [41][42][43], whilst being useful for data dissemination (e.g., traffic from the sink node to the sensor nodes), cannot be applied for data collection, which is the most important traffic in WSNs.
In fact, to apply NC for data gathering in WSNs, some issues have to be solved.
(i) Header Overhead. NC schemes are mostly based on random linear codes [44,45] which allow implementing them in a distributed manner but introduce large overhead because coefficients used for linear combinations should be specified in packets header. The header size is proportional to the number of aggregated packets that, in the specific case of data collection in WSNs, could be equal to the number of nodes in the network.
(ii) All-or-Nothing Problem. When packets are combined using NC, the sink has to receive at least packets in order to be able to recover the original information. Thus, even if the sink receives − 1 packets, it cannot recover any information. Instead, graceful degradation is desirable in WSNs.
(iii) Delay. The delay introduced by NC might be prohibitive for large networks where a large number of packets should be combined and decoded. Instead, many sensor networks applications, for instance, WSNs developed for control/automation or real-time audio/video streaming, require small bounded delays.
(iv) Duty Cycling. Most of NC schemes are based on overhearing; that is, nodes should remain in active mode to participate in NC-based routing, which increases the energy consumption of the sensor nodes. So, it is difficult to couple NC paradigm and duty-cycling techniques commonly used in WSNs.
(v) Reliability. Full (or at least high) reliability is desirable in sensor networks and is mandatory in several scenarios, for instance, in new scientific experiments, where accuracy of observations is critical, or in the case of biomedical applications, where it is necessary to ensure that important details are not lost causing errors in medical diagnosis. When random codes are used, even in a reliable network, the original messages can be retrieved with "high probability" (though not "certainty"), and high probability is achieved through the use of large finite fields (i.e., large coefficients and therefore large headers).
(vi) Complexity. NC techniques should be simple to cope with low computational and memory resources of sensor nodes. We refer the reader to [46] for further considerations on the applicability of NC to WSNs.
The above issues have been solved in [16] where the authors proposed a new forwarding technique for WSNs based on the Chinese Remainder Theorem (CRT) able to achieve different trade-offs between reliability and energy saving.
Basically, CRT can be seen as a splitting technique able to transform an integer number into a vector of smaller numbers named CRT components, { }. CRT components are obtained from number using modular arithmetic as = (mod ), where (with ∈ [1, . . . , CRT ]) are prime numbers (or at least pairwise coprime integer numbers).
CRT states that every integer number can be exactly recovered from its CRT components if the product of prime numbers = ∏ CRT =1 satisfies the condition > (henceforward named reconstruction condition). In particular, the CRT always states that can be recovered through a simple linear combination as Coefficients are given by = ⋅ , where = / and is its modular inverse obtained by solving = 1 (mod ).
CRT can be applied in WSNs to split packets produced by sensor nodes. Such smaller packets (i.e., CRT components) can be sent through different paths by exploiting path diversity of WSNs. The fact that relayers nodes forward smaller packets allows reducing energy consumption [47].
Moreover, CRT has several advantages in comparison to other NC techniques: (i) The set of prime numbers { } can be chosen so that information produced by the sensor nodes can be reconstructed even if only a fraction of the CRT components are received by the sink, by improving reliability and solving the all-or-nothing problem. (ii) Differently from coefficients used for NC techniques, the set of prime numbers can be obtained directly by the sink (i.e., CRTs avoid header explosion). (iii) CRT can be efficiently combined with duty-cycling techniques [48] and distributed compression algorithms [19] to achieve an efficient data aggregation technique.
Considering the above advantages and the fact that this paper is focused on data gathering techniques for WSNs, we will consider CRT as representative of networking-based data gathering techniques.

Simulation Scenario
In this section, we will discuss the WSN model used for comparisons and simulations of data gathering techniques.

Network Model.
We assume a WSN where the sink is located in the center of a square sensing area of size × [m 2 ] and sensor nodes are randomly distributed with density [nodes/m 2 ]. Each sensor node has a transmission range equal to [m] (with ≪ /2) and sends its data to the sink through a multihop scheme.
The network is partitioned into nonoverlapped clusters using the procedure described in [16,49]. The above-mentioned procedure is mainly based on the exchange of Initialization Messages (IMs) and allows organizing the network in clusters minimizing the number of hops needed by a sensor node to reach the sink. The sink is supposed to belong to cluster 1 (denoted as CL 1 ) and generates a first IM with its own address and a sequence number SN = 2. Each node, which receives an IM from its neighbors with a sequence number SN = ℎ, will belong to cluster CL ℎ and will retransmit the IM with an increased SN value together with its own address and the list of the nodes that will be used as forwarders (which it knows according to the source addresses specified in the received IMs). On the basis of the received IMs, at the end of the above procedure, each node in the network will know its own next hops and which other nodes will use it as a next-hop. Further details on the initialization procedure are reported in [49].
We assume that the above initialization procedure is carried out only one time so we neglect related energy consumptions.
In the following, nodes along the path from a source to the sink are referred to as relayers and nodes located one hop away from the source along the path to the sink are specifically called one-hop relayers.
We assume that, independently of the specific data gathering technique, relayers transmit packets through a loadbalancing shortest-path scheme; that is, a node in cluster CL ℎ+1 will select randomly a reachable node in the next cluster toward the sink (CL ℎ ), and this forwarding scheme is repeated until the sink is reached. In this manner, information reaches the sink with the minimum number of hops.

Data Gathering Model.
Until now, we have classified data gathering techniques on the basis of the data aggregation technique used. However, data gathering techniques can be classified also considering the factors that drive data acquisition. In particular, four broad categories can be distinguished [50]: event-driven, time-driven, query-based, and hybrid.
In event-driven category, data are generated when an event of interest occurs, while in the time-driven category data are periodically sent to the sink at constant interval of time; in query-based category, data are collected according to sink requests. Finally, the hybrid approach is a combination of one or more of the above.
For simulation purpose, with the aim of evaluating energy consumptions and reliability, all the above categories can be unified with an abstraction of the concept of event, that is, by simply considering that data must be sent as a consequence of an event.
In particular, in query-based network an event is triggered by the reception of the query message while in timedriven networks the event can be associated with the rising International Journal of Distributed Sensor Networks 7 clock edge of the sampling unit or, more practically, when a sufficient number of measures has been collected and a packet is ready to be transmitted.
Considering the above abstraction, energy consumptions and network reliability can be evaluated in terms of number of events (i.e., packets sent) by not taking into account who drives the event.
Therefore, in our simulation scenarios we will consider that V events randomly occur in the sensor network and that, for each event, nodes recognize the event and generate a packet. More precisely, we assume that only nodes inside the circular area of radius , with center in the location of the event, detect the event and therefore need to send a packet. Henceforward, we call the circular area related to an event a cell.
In event-driven networks, usually small packets are sent to specify that an event has been detected (a single -bit word could be sufficient in most cases). Instead in the case of time-driven or query-driven networks, packets represent measures collected in the time interval between two events. Both cases can be taken into account considering that for each event raw information of bits has to be sent for each node by fixing = 1 for event-driven networks and ≥ 1 in the case of time-driven or query-driven networks.
With the aim of reducing energy consumptions (i.e., the overall number of bits sent), raw data are not directly transmitted; instead, data are processed according to the chosen data gathering technique.
More precisely, we have the following.
(i) TEC. Nodes using TEC techniques exploit temporal correlation to reduce the number of bits.
Here we do not consider a specific TEC technique but assume that the compression factor, TEC , of the TEC technique used is known. As a consequence, we can state that using a TEC technique each piece of raw data of -bits will be represented after compression with TEC = / TEC bits and that for each event a node must transmit TEC = / TEC bits. So considering that nodes sense the event, the overall number of bits transmitted for each event when TEC is used is As already stated, we assume that packets are transmitted through a load-balancing shortest-path scheme; that is, a node in cluster CL ℎ+1 will select randomly a reachable node in the next cluster toward the sink (CL ℎ ), and this forwarding scheme is repeated until the sink is reached. In this manner, information reaches the sink with the minimum number of hops.
(ii) DSC. According to DISCUS, we assume that only one node for each cell (henceforward named the cell head) sends uncompressed measures (i.e., side information) into a packet of bits while all the other nodes send compressed packets of DSC = / DSC bits. Also in this case we assume that all packets are transmitted through a load-balancing shortestpath scheme. However, to improve reliability we assume that the cell head transmits its packets ,DSC times.
So considering that nodes sense the event, the overall number of bits transmitted for each event when DSC is used is (iii) CS. We assume that the cell header collects the packets of the other nodes in the same cell and sends them by applying CS. More precisely, the collected measures can be represented by a matrix X of × values of -bits each where the th column x represents the measures taken by nodes almost at the same time. Considering that such values are highly correlated in both space and time by taking a proper transform, we obtain with high probability a sparse vector. For instance, we can assume that DCT is applied to each column vector x and that only DCT coefficients will be nonzero. In this case the cell head needs to send = ( log( / )) measurements for each column, that is, ⋅ measurements for each event.
For simulation purpose, we suppose that CS measurements are represented by CS bits and that those measurements are sent through packets of CS = ⋅ CS bits each through a load-balancing shortest-path scheme.
So the overall number of bits transmitted by the cell head for each event when CS is used is Other choices are possible without altering the overall number of transmitted bits CS ; for instance, we could have considered packets of ⋅ CS bits, but the previous choice will simplify comparisons. In particular, we will show that with the above choice comparison results will be independent of so our results will be valid for both event-driven ( = 1) and time/query-driven ( ≫ 1) techniques.
In particular, we suppose that for each event source nodes send their packets to a common set of one-hop relayers named CRT relayers. Henceforward, we indicate by CRT the number of CRT relayers.
As in the case of CS, the collected measures can be represented by a matrix X of × values of -bits each where the th column x represents the measures taken by nodes almost at the same time.
CRT relayers process the data of each column x in two steps: (1) In the first step, received data are compressed with a compression algorithm by obtaining a binary sequence .
(2) In the second step, CRT is applied to improve reliability by splitting the binary sequence so that each CRT relayer forwards a CRT component.
It is worth noting that each CRT relayer will independently compress the received packets by obtaining the same sequence . This is possible mainly because as they receive the 8 International Journal of Distributed Sensor Networks same data set and apply the same compression algorithm, the compressed sequence obtained is the same for all relayers.
Henceforward, we indicate by the length of the compressed sequence .
CRT relayers split the binary sequence they have constructed and forward it. Specifically, the sequence is interpreted as an integer = ∑ −1 =0 ⋅ 2 (where are bits of ) and by properly choosing the set of prime numbers { } each CRT relayer calculates and forwards the corresponding CRT component = (mod ). Note that ⌈log 2 ( )⌉ is the number of bits needed to represent , so the overall number of bits transmitted by the CRT relayers is From the theory of CRT, the sink will be able to reconstruct all raw measurements from the CRT components provided that the reconstruction condition is satisfied (i.e., ∏ CRT =1 ≥ 2 ). Note that the reconstruction condition can be satisfied by multiple sets of prime numbers; however, to reduce the number of bits needed to represent values , and therefore the overall number of bits sent, it is preferable to choose the smallest possible set of primes, which we refer to as the

Minimum Primes Set (MPS).
For instance, if CRT = 4 and = 40, the MPS will be {1019, 1021, 1031, 1033}. In fact, this is the set of the smallest four consecutive primes that satisfy the relationship ∏ CRT =1 ≥ 2 40 . However, when the set of primes is chosen as above, the message can be reconstructed if and only if all the CRT components are correctly received by the sink. So, to take into account the possible losses due to the wireless medium unreliability, we use the MPS with admissible failures (MPS − ), that is, the set of the smallest consecutive primes that satisfy the reconstruction condition even if CRT components are lost. As shown in [16], when , CRT , and are fixed, the MPS − set is unique so CRT relayers can obtain the MPS − in a distributed manner. [13] and references therein, differences among two consecutive samples of several realworld data (temperature, humidity, solar radiation, etc.) fit well with Gaussian distributions. So, in this paper we consider that sensed data are approximated by a Gaussian distribution and that they are correlated in both space and time.

Source Model. As shown in
This choice is motivated also by the fact that several analytical results are well known for Gaussian distribution and can be readily exploited to obtain the maximum lossless compression factor for correlated Gaussian sources.
As is well known when compression of discrete sources is considered, Shannon's entropy gives the lossless compression limit.
For Gaussian correlated data, under suitable assumptions and without loss of generality, it can be shown that, considering 1 , . . . , obtained from quantization of continuous Gaussian variables 1 , . . . , , the joint entropy is [51] ( 1 , . . . , ) = ℎ ( 1 , . . . , ) where |Σ|, known as generalized variance, is the determinant of the covariance matrix Σ and ℎ(⋅) is the differential entropy (rigorously speaking, we have to distinguish between differential entropy ℎ( ) for a continuous source (i.e., before the A/D conversion) and Shannon's information entropy ( ) for a discrete source (i.e., after quantization introduced by the A/D conversion); however, it is straightforward to prove that their values coincide when unitary quantization step is considered, as done in this paper). In particular for Gaussian sources with the same correlation coefficient and variance 2 , the generalized variance is |Σ| = 2 ⋅ [1 + ( − 1) ] ⋅ (1 − ) −1 and therefore Moreover, considering that for a broad range of values (i.e., ∈ [0, 0.99], ≥ 8) the second term is negligible, it follows that Therefore, ideal (maximum) lossless compression factor for Gaussian variables considering blocks of correlated values of -bit each can be obtained as Throughout the paper, we assume that (i.e., maximum lossless compression factor). As a consequence, we have As regards CS, the actual compression factor is related to the sparsity level = / in fact Note that CS is a decreasing function of . So we consider two cases: an ideal sparsity level ideal such that CS = ,ideal and a slightly greater value = 1.2 ⋅ ideal .
Finally in the case of CRT we consider that the simple MinDiff algorithm proposed in [4] is used for compression.
Basically, MinDiff encodes a set of uncompressed data = { } with another set of compressed data = { , 1 , . . . , }, where = min { } is the minimum of the values in and are the differences = − represented with = ⌈log 2 (max { } + 1)⌉ bits each. The number of bits needed to represent the set of differences and the value of are necessary for proper reconstruction and therefore an overhead of + log 2 ( ) bits must be considered.
Therefore, its compression factor considering blocks of values of -bit each can be obtained as

Energy Model.
Similarly to other works (e.g., [16,19]), we consider a simple energy model where for each bit to be transmitted a node spends an energy equal to . Apparently, it seems that the model neglects the energy needed for computation and for reception but this is not true if we reflect on the fact that in sensor networks the number of bits transmitted, the number of bits received, and the number of processing operations are all proportional to the number of sensed measures. So energy needed for computation and for reception can be easily included in . For instance, let us suppose that a node for sensing and processing measures of bits needs an energy equal to ⋅ and that, using a proper compression technique with a compression factor equal to , it reduces the number of bits to be transmitted from to / . In this case, the overall energy is ⋅ + ( / ) ⋅ which we can rewrite as ( / ) ⋅ considering = + . Finally, if also the energy needed for reception must be included and it differs from the energy used for transmission, considering that for almost all nodes the number of bits received is equal to the number of bits transmitted, it will be sufficient to use = + + . Therefore, the main simplification introduced by our model is that we consider distance-independent; that is, we do not consider that could be adaptively changed by the MAC layer on the basis of distance between source and destination node.

Performance Metrics
In order to estimate the energy efficiency of the above techniques, let us introduce the Energy reduction factor, ERF , which represents the percentage reduction of the energy spent using a specific data gathering technique ( ) as compared to the case when raw measures are directly sent ( RAW ). This metric is defined as where ∈ {TEC, CS, CRT, DCS}. When an ideal lossless network is considered, the ERF can be evaluated as where is the energy spent by a node to transmit a single bit and is the overall number of bits transmitted considering the specific data gathering technique derived in Section 4 (see (7)-(10)).
For instance, in the case of TEC-based data gathering is TEC = TEC = / TEC and as a consequence, However, in the case of lossy network, the number of transmitted and received bits is different and the expected energy reduction factor has to be expressed taking into account the actual number of bits forwarded.
For comparison purpose, we decided to evaluate energies considering nodes belonging to cluster 2 (i.e., CL 2 ).
We restrict our analysis to the nodes of the second cluster for two reasons. Firstly, these nodes are the most critical as they represent the sinks neighbors. In fact, if these nodes run out of energy, the sink remains isolated. Secondly, network lifetime is defined as the time until the first node in the network dies and with high probability, if not certainty, this node belongs to CL 2 considering that all messages are routed to the sink through these nodes.
Finally, considering that network lifetime is related to the maximum energy consumed by a node in this paper, we investigate also the energy reduction factor related to the maximum energies: Concerning reliability, we consider that a node fails to forward a packet with probability and evaluate the ratio , between the number of raw measurements that are obtained by the sink and the number of raw measurements generated from source nodes or, equivalently, where lost, is the number of raw measurements that are lost due to the network and/or reconstruction errors.
,TEC can be easily estimated by assuming a perfect decoding technique where all received data are correctly decoded by the sink.
In fact if ℎ is the number of hops needed to reach the sink and is the probability that a node fails to forward a packet, the probability that a packet is lost is = 1−(1− ) ℎ and therefore the expected number of lost data is lost,TEC = . As a consequence ,TEC = 1 − lost,TEC Note that the reliability of TEC techniques is related only to network parameters and ℎ and cannot be improved without relying on channel coding techniques (e.g., FEC).
Differently from TEC techniques, for all the other data gathering techniques, reliability can be improved with a proper settings of design parameters. Nevertheless, a tradeoff exists between reliability and energy saving as briefly discussed below.
(i) CRT. In the case of CRT data gathering primes, numbers can be selected so that all raw measures can be reconstructed even if at most CRT components are lost.
As shown in [16], when is fixed the reliability can be estimated as where = 1 − (1 − ) ℎ is the probability that a CRT component is lost.
As general rule, high reliability can be obtained by fixing so that = CRT + where is a small constant on the order of √ CRT .
This result can be justified by considering that ,CRT (see (25)) can be approximated by the cumulative distribution function of a normal variable with mean CRT and variance CRT (1 − ).
For instance, in Figure 1 we show the reliability ,CRT for different values of CRT and when = 0.04. As it is possible to observe, small values of (e.g., = 6) are sufficient to achieve high values of reliability (>0.99).
Higher reliability can be obtained by further increasing the value of the parameter . However, by increasing energy consumptions are increased too so in the next section we investigated the trade-off between reliability and energy consumptions for different values of .
(ii) DSC. Also in the case of DSC, the probability to lose a packet is = 1 − (1 − ) ℎ . However, DSC compressed measures cannot be recovered if side information (i.e., packets generated by the cell head) is not received.
So, in order to improve the reliability we considered that cell head transmits ,DSC times the side information.
In this case, DSC reliability can be evaluated as  (iii) CS. By indicating with the number of packets sent with CS, the probability to receive at least * packets is Therefore, by increasing it is possible to guarantee that, with the desired probability CS , at least * packets are received by the sink. As a general rule, high reliability can be obtained by fixing so that = * /(1 − ) + , where is a small constant.
However, differently from all previous techniques, CS reliability is not related only to the number of received packets so that CS , henceforward named network reliability, is not the actual CS reliability (this justifies why we used notation CS instead of CS ). In fact, CS techniques are based on reconstruction algorithms that could introduce errors so that reconstructed valueŝcould differ from original raw data .
Nevertheless, if we assume that raw data are quantized values (as usual in WSNs where raw measures come from ADCs), we can state that quantized measures can be exactly recovered if the reconstruction error |̂− | is smaller than quantization error Δ /2, where Δ is the quantization step used for quantizing .
Therefore, in our simulations, the reconstruction error, |̂− |, is evaluated for each measure and raw data is considered lost if this error is greater than Δ /2.
From simulation point of view, actual CS reliability can be evaluated as where lost,CS is the number of lost measures considering both packet loss and reconstruction error. We show in the next section that by choosing CS = + 3 and = * /(1 − ) + 2 we can obtain ,CS ≈ 1.
Two reconstruction algorithms are considered for CS in this paper: ideal (i.e., oracle-based) reconstruction, where exact positions of nonzero values are assumed to be known at the sink node, and CoSaMP [27]. To distinguish among them, we indicate their reliability as ,CSI and ,CoSaMP , respectively.

Simulations Results
In this section, we compare data gathering techniques in terms of energy consumptions and reliability.
The results have been obtained through a custom C++ simulator. For each set of parameters mean results are reported considering 20 random topologies where nodes are uniformly distributed in a square area of size 600 × 600 [m 2 ], with density [nodes/m 2 ]. We also assume that V events randomly occur in a faraway cluster (e.g., CL 5 so that ℎ = 4 hops are needed to reach the sink) and that each event is detected by = 2 source nodes (where is the sense radius).
If not otherwise stated, V = 300 events are considered and raw data are represented with = 12 bits (which is a typical number for ADCs used in sensor networks).

ERF with Reliable Networks.
To asses the simulator we first analyzed an ideal (fully reliable) WSN and evaluated the ERF for different values of and when different data gathering techniques (TEC, CRT, DSC, and CS) are considered. In particular, for CS two cases have been considered: an ideal case (CSI) where the sparsity level = / is fixed to the minimum value = ideal such that ,CS = ,ideal (see (17) and (14)) and a second case (CS2) where a slightly greater value = 1.2 ⋅ ideal is used.
As shown in Figures 3 and 4, the results obtained through the analytical model (see (20)) and those reported by the simulator are very close to each other for all the values of and considered. These results confirm the validity of our model. The same results are obtained for different values of , that is, by changing the number of raw measures per packet. This can be easily justified by the fact that RAW = and = are both proportional to so their ratio and therefore also the ERF (see (19)) are independent of . So simulation results for different values of are not shown for the sake of space.
On the basis of the previous results, we can state that in the case of reliable networks TEC and DSC have a greater ERF and therefore it seems that they should be preferred to the other analyzed data gathering techniques. However, as shown in the next section, this result is not true when reliability is an issue.
Note that in our simulations ERF DSC and ERF TEC are almost the same because we fixed the same values for spatial and temporal correlation coefficients. Obviously, when spatial and temporal correlation are not the same different results can be obtained.
Finally, note that although CRT appears to be the worst in terms of ERF it is the only data gathering technique where an actual compression algorithm (MinDiff) has been considered (for all the other techniques, ideal compression factors have been assumed). So previous simulations results allow quantifying the penalty in using a simple compression algorithm (MinDiff) instead of more complex techniques (at least when Gaussian data are considered). By comparing Figures 3 and 5, we can see that CRT and CS achieve different values of ERF when the sensing radius, , changes. In particular, ERF CSI is no more able to reach the same values of ERF DSC for low values of . This can be justified considering that when decreases, the sparsity level ∝ 1/ increases, and, as a consequence, the compression factor CS decreases too (see (17)).
It is worth nothing that our analytical model is able to anticipate this result.
Also in the case of CRT, the ERF slightly decreases for lower values of due to the fact that when decreases the overhead of the MinDiff algorithm is more relevant and ,MinDiff is not able to approach ,ideal (see (18)).
Similar considerations can be made about the node density . In fact, as it is possible to observe by comparing Figures 3 and 6  without altering the overall number of source nodes (i.e., = ⋅ ⋅ 2 = 45 in both cases).

ERF with Unreliable
Networks. In Figures 7 and 8, reliability and ERF of data gathering techniques for = 0.01 and different values of are reported.
On the basis of the simulation results, we can state that even in the case of unreliable networks TEC and DSC have greater ERF (see Figure 8). However, their reliability is fully determined by the packet loss probability and cannot be improved; instead, by using CRT and CS higher reliability can be achieved by increasing and , respectively.
In particular, as shown in Figure 7, by fixing = 8 a reliability higher than 0.975 can be achieved for CRT and even higher values can be obtained using CS when = * /(1 − ) + 2 CS measures are sent. It is important to note that the reliability plotted for CS is the actual reliability obtained after reconstruction with an ideal (oracle-based) reconstruction algorithm (i.e., ,CSI and not ,CS ).
By choosing so that = * /(1 − ) + 2, we have with high probability (i.e., ,CS > 0.99) that the sink receives at least * CS measures, that is, the minimum number of measures sufficient for reconstruction, and therefore original measurements can be perfectly recovered (so that ,CS = 1).
In the next subsection, we will show that this is true only if CS = + 3, as chosen for our simulations. Obviously, high reliability is achieved at the cost of a lower ERF but, by comparing Figures 8 and 3, we can state that the impact on the ERF is quite low (both ERF CRT and ERF CS decrease by a few percent).
As a consequence when high reliability is needed even with unreliable networks, CRT and CS should be preferred.

CS Reliability.
In all the previous simulations, we have fixed the number of bits CS used for representing quantized CS measures equal to + 3. A careful reader could observe that this choice is questionable and that by reducing CS higher values of ERF CS can be obtained. This consideration is partially true: effectively, ERF CS increases for lower values of CS , but our simulation results show that using values of CS below +3 is not possible to have perfect reconstruction (i.e., ,CS = 1).
To convince the reader in Figure 9, we report simulation results about the actual reliability ,CS for different values of CS and sparsity levels when = 12. Simulation parameters are those reported in Figure 3.
As it is possible to observe, when CS < 15 reliability quickly decreases.
Similar results have been obtained for different values of .
However, in practice, actual reliability depends also on the reconstruction algorithm used. For the sake of completeness, we report in Figure 10 CS reliability when the CoSaMP [27] algorithm is used for reconstruction.
As it is possible to observe also in this case CS = 15 is needed for achieving high reliability. Nevertheless, perfect reliability is not obtained with CoSaMP when = * . So in some cases CRT could be preferred to CS because reliability can be better predicted.

ERF
and Network Lifetime. The ERF metric is an expression of mean energy consumptions; instead, network lifetime is more closely related to maximum energy consumptions.
In Figure 11, we report ERF max instead of ERF for different data gathering techniques. ERF max is evaluated on the basis of (22), that is, considering the maximum energy consumptions for nodes belonging to cluster CL 2 . Maximum energies have greater variations in comparison to mean values so in order  to have higher confidence we increased the number of events to V = 3000.  As it is possible to observe, TEC and DSC have higher ERF max and therefore greater network lifetime can be expected when DSC and TEC are used for data gathering.
Finally, note that CRT and CS have similar performance if the actual sparsity degree of CS is 20% more than the minimum (ideal) value. These observations can be extended also to all the previous simulations.

Conclusions and Future Works
In this paper, we have compared several data gathering techniques used in WSNs by using both simulation results and analytical models. In particular, the effectiveness of the above techniques has been investigated in terms of reliability (packet loss and reconstruction errors) and energy efficiency (i.e., ERF and network lifetime) by systematically sampling the parameter space (i.e., number of nodes, transmission range, and sparsity). Basically we can summarize our results as follows: (i) DSC and TEC techniques should be preferred for maximizing network lifetime.
(ii) CS should be preferred when high reliability is needed.
(iii) CRT should be preferred for its inherent low complexity.
As a consequence, we can state that there is no best solution for all possible applications and that only the tradeoff between energy consumptions, reliability, and complexity can drive the choice of the data gathering technique to be used for a specific application.
As future work, we plan to refine and improve the model to deal with actual correlated measurements (i.e., not only Gaussian data) and more realistic propagation channels (i.e., by taking into account actual distance between nodes).

TEC:
Transform and Encoding-Based Compression CS: Compressive sensing DSC: Distributed source coding CRT: Chinese Remainder Theorem : Generic data gathering technique; that is, ∈ {TEC, CS, DSC, CRT} = { }: A set of uncompressed values (raw measurements) : Number of bits used to represent a raw value (without compression) : Number of measurements/words per packet : Number of source nodes X: M a t r i xo f × raw data of -bits each x : th column of the matrix X (measures taken by nodes almost at the same time) : Compression factor of the data gathering technique ∈ {TEC, CS, DSC, CRT} : Number of bits used to represent a compressed measure with data gathering technique : Number of bits per packet when data gathering technique is used : Overall number of bits transmitted for each event when the data gathering technique is used ERF : Energy reduction factor of the data gathering technique ∈ {TEC, CS, DSC, CRT} : Reliability of the data gathering technique ∈ {TEC, CS, DSC, CRT} : Minimum value within set { }; that is, = min { } : Difference with the minimum element of { }; that is, = − : Compressed sensing measurement : Number of CS measurements * : Minimum/sufficient number of CS measurements for reconstruction : Number of nonzero components of asparse signal : Sparsity level of CS measurements; that is, = / : Correlation coefficient of raw measurements : Standard deviation of raw measurements : Maximum transmission distance (coverage radius) : Sense radius of a node CRT : Number of CRT relayers : Grid size (edge in meter) : Node density : Energy spent for each transmitted bit : Shannon ' sentropy ,DSC : Number of retransmissions for the cluster/cell head when DSC is used CL ℎ : Cluster number ℎ ℎ: Number of hops between nodes in clusters CL ℎ+1 and CL ℎ V : Number of events used for simulations.