Hilbert-Curve Based Data Aggregation Scheme to Enforce Data Privacy and Data Integrity for Wireless Sensor Networks

Data aggregation techniques have been proposed for wireless sensor networks (WSNs) to address the problems presented by the limited resources of sensor nodes. The provision of efficient data aggregation to preserve data privacy is a challenging issue in WSNs. Some existing data aggregation methods for preserving data privacy are CPDA, SMART, the Twin-Key based method, and GP2S. These methods, however, have two limitations. First, the communication cost for network construction is considerably high. Second, they do not support data integrity. There are two methods for supporting data integrity, iCPDA and iPDA. But they have high communication cost due to additional integrity checking messages. To resolve this problem, we propose a novel Hilbert-curve based data aggregation scheme that enforces data privacy and data integrity for WSNs. To minimize communication cost, we utilize a tree-based network structure for constructing networks and aggregating data. To preserve data privacy, we make use of both a seed exchange algorithm and Hilbert-curve based data encryption. To support data integrity, we use an integrity checking algorithm based on the PIR technique by directly communicating between parent and child nodes. Finally, through a performance analysis, we show that our scheme outperforms the existing methods in terms of both energy efficiency and privacy preservation.


Introduction
With the proliferation of advanced technologies of mobile devices and wireless communication, wireless sensor networks (WSNs) are increasingly attracting interest from both industry and research institutes [1][2][3]. Because sensor nodes have limited resources (i.e., battery and memory capacity), data aggregation techniques have been proposed for WSNs [4][5][6][7][8][9]. However, the wireless communication can be overheard, and consequently data privacy in sensor networks is a crucial issue. Although data aggregation schemes that preserve data privacy have been proposed, they have the following limitations. First, the communication cost for network construction and data aggregation is considerably expensive. Second, the existing schemes do not support data integrity due to communication loss. Since the existing privacypreserving schemes do not support privacy preservation and integrity protection simultaneously, it is necessary to carefully design an effective data aggregation scheme for recent applications of WSNs, such as military and environmental monitoring, where both privacy and integrity of the sensed data should be provided [10].
To resolve these problems, we propose a new energy efficient and privacy preserving data aggregation scheme in WSNs. To reduce the communication cost for preserving data privacy, we propose a seed exchanging algorithm for data aggregation. The seed generated by this algorithm is used not only to conceal the sensed data but also to preserve data privacy without additional message exchanges during the data aggregation step. For data privacy preservation, we also utilize a Hilbert-curve based technique, where it is difficult to obtain the actual sensed data, even if attackers try to overhear it, because the data being sent can be changed by using a unique Hilbert value. For providing data integrity, we propose an integrity checking algorithm based on a private information retrieval (PIR) technique. Upon receiving aggregated data from child nodes, a parent node starts an integrity checking algorithm in which the parent node generates a message based on the PIR technique by multiplying two large prime numbers. By sending a PIR 2 International Journal of Distributed Sensor Networks message to child nodes, the parent node can verify the aggregated data. Our integrity checking algorithm is more efficient than the existing schemes since it checks the data integrity between child and parent nodes, instead of checking all data during the communication. Therefore, our scheme requires low communication cost and yields an accurate aggregate result even in reasonably dense networks.
This paper is organized as follows. In Section 2, we present related work on privacy preserving aggregation schemes in WSNs. In Section 3, we provide both considerations and attack models for designing an efficient privacy preserving aggregation scheme. In Section 4, we propose a new privacy preserving data aggregation scheme including a seed exchange algorithm in WSNs. In Section 5, we present a performance analysis of our scheme. Finally, we draw conclusions and suggest future work in Section 6.

Related Work
In this section, we present the existing data aggregation schemes for supporting data privacy and data integrity in WSNs. Privacy preserving data aggregation schemes include CPDA, SMART, Twin-key, and GP2S. He et al. [11] proposed a Cluster Based Private Data Aggregation (CPDA) method in which a cluster header aggregates data from cluster members. The CPDA method first constructs clusters to perform intermediate aggregations. All nodes include a head node within a cluster and then share public seeds, where is the number of cluster members. Next, each node generates − 1 private seeds and sends messages generated by using the public and private seeds together with sensed data. Finally, the cluster head calculates their aggregate value by using its own private numbers and received information. However, the CPDA method has high communication cost to perform data aggregation. He et al. [11] also proposed a Slice Mix AggRegaTe (SMART) method to achieve private data preservation by using a data slicing technique. For this, each node randomly selects a set of nodes within ℎ hops and slices its private data into pieces randomly. One of the sliced data is kept on the node which sensed the data, and the remaining − 1 pieces are encrypted and sent to pre-selected nodes. When a node receives the sliced data from neighbors, it aggregates the received data and sends the result to the sink node. The SMART method also suffers from high communication cost, however, because each node should share its divided data among neighboring nodes. Conti et al. [12] proposed a keys-based private data preservation method called Twin-key. Because the Twinkey scheme can prevent leakage of the sensed data during the data aggregation process, it is robust to data loss. For providing the robustness to data loss, they set up Twinkey, during constructing clusters [13], where two neighboring nodes share at least one common key corresponding to a hash value. Data aggregation is thus performed twice along with the Hamiltonian circuit in which each node adds its sensed value to the partial aggregate value. At the same time, for each live twin-key it adds or removes a corresponding shadow value in accordance with the live announcement. As a result, each cluster head obtains the correct aggregate for the cluster. The cluster head then passes the aggregated value to the sink node by following a tree aggregation structure. However, the Twin-key method has high communication cost due to the process of live announcement and data aggregation. Finally, Zhang et al. [14] proposed the Generic Privacy Preservation Scheme (GP2S) for perturbed histogram-based aggregation. This scheme supports data aggregation for a variety of queries since it provides both individual data and aggregated data. For this, each sensor node is preloaded with a secure oneway hash function that maps a bit string to a value between 0 and − 1, where is a system parameter. A sink node then sends out a query message with a threshold (i.e., data duration). After receiving the query, each sensor node sends its data composed of a hash function. If the sink node receives aggregated data from all child nodes, it determines the distribution of sensed data readings. However, the accuracy of the aggregated value of the network data is low and the data privacy can be broken by a data aggregator (parent node) having leaf nodes.
He et al. [15] also proposed iPDA and iCPDA schemes to support integrity checking in WSNs, by extending their previous schemes, SMART and CPDA, respectively. To the best of our knowledge, the schemes are the first to address both privacy preservation and integrity protection for data aggregation in WSNs. The iPDA scheme utilizes the data slicing and assembling technique of SMART to preserve data privacy. It protects data integrity by utilizing a node disjoint between two aggregation trees rooted at the query server, where each node belongs to a single aggregation tree. When the aggregated data from both aggregation trees are compared, the query server accepts the aggregate result if the difference in the aggregated data from the two aggregation trees does not deviate from the predefined threshold value. Otherwise, it ignores the aggregated result by considering it as polluted data. However, iPDA has some shortcomings. First, it is impractical to compare aggregated values of two nodedisjoint aggregation trees, because it cannot be expected that all nodes will reply to all requests, due to the unreliability of a WSN. Second, for a secure communication channel from adversaries, all sensor nodes use secret keys to encrypt their all data slices before sending it to their 2 * ( − 1) sensor nodes. Every sensor node thus has computational overhead from decrypting all the slices before aggregating them. Because encryption/decryption is an expensive operation for resources-constrained sensor nodes, iPDA has high computation cost. Third, the technique for slicing and assembling is only operable while the collusion of sensor nodes is up to a certain threshold (i.e., the sum of outdegree and in-degree minus one). If the number of colluding sensor nodes exceeds the threshold, the sensor nodes may collaboratively reveal private information of other nodes. Although the threshold can be raised by increasing the number of slices, this will further increase communication overhead. Finally, since each sensor node has to transmit five to six messages on average, the iPDA scheme has high data propagation delay. Meanwhile, iCPDA requires three rounds of interactions. In this scheme, each node first sends a seed to other cluster members. Next, each node hides its sensed data via the received seeds and sends the hidden sensed data to each cluster member. Each node then adds its own hidden data to the received data, and it sends the calculated results to its cluster head. To enforce data integrity, cluster members check the transmitted aggregated data of the cluster head. However, iCPDA has some disadvantages. First, its communication overhead increases significantly with respect to the cluster size. Second, its computational overhead increases rapidly with an increase of the cluster size, whereas a decrease of cluster size introduces lower privacypreserving efficacy. Finally, iCPDA has high data propagation delay due to its three rounds of interactions.
It is thus necessary to design a new data aggregation scheme that supports both data privacy and data integrity. The new scheme should be reliable and efficient in terms of energy consumption, propagation delay, and the accuracy of the aggregated result.

Design Considerations
In this section, we present requirements for a data aggregation scheme to support both data privacy and data integrity.
The desired data aggregation scheme should satisfy the following criteria.
(1) Data privacy: privacy concern is one of the major obstacles to civilian applications for wireless sensor networks. Curious individuals may attempt to gather more detailed information by eavesdropping on the communications of their neighbors. It is increasingly important to develop data aggregation schemes to ensure data privacy against eavesdropping.
(2) Data integrity: since data aggregate results may be used to make critical decisions, a base station needs to guarantee the integrity of the aggregated result before accepting it. Therefore, it is crucial that data aggregation schemes can protect the aggregated results from being polluted by attackers.
(3) Efficiency: data aggregation achieves bandwidth efficiency through in-network processing. In integrityprotecting private data aggregation schemes, additional communication overhead is unavoidable to achieve additional features. However, the additional overhead must be kept as small as possible.
(4) Accuracy: an accurate aggregate result of sensed data is generally desired. Therefore, we should take accuracy as a criterion to evaluate the performance of integrity-protecting private data aggregation schemes. When accurate aggregate results are needed, schemes based on randomization techniques are not applicable.
On the other hand, there exist multiple potential attacks against a data aggregation scheme. Some attacks aim at disrupting the normal operation of the sensor network, such as routing attacks and denial of service (DoS) attacks. A number of previous efforts have addressed these behaviorbased attacks. In this paper, our major concern is the types of attacks that try to break the privacy and/or integrity of aggregate results, rather than worrying about those attacks. We assume that a small portion of sensor nodes can be compromised and focus on the defense of the following categories of attacks in wireless sensor networks.
(1) Eavesdropping: in an eavesdropping attack, an attacker attempts to obtain private information by overhearing transmissions over its neighboring wireless links or colluding with other nodes to uncover the private information of a certain node. Eavesdropping threatens the privacy of data held by individual nodes.
(2) Data pollution: in a data pollution attack, an attacker tampers with the intermediate aggregate result at an aggregation node. The purpose of the attack is to make the base station receive a wrong aggregate result with large deviation from the original result, which leads to improper or wrong decisions. In this paper, we do not consider the attack where a node reports a false reading value, and we assume that the impact of such an attack is usually limited. By using privacy preservation measures, individual sensory data are hidden. However, not only the sensory data but also the aggregated value of a small group of sensors must be in a reasonable range. This implies that if a malicious user pollutes the individual sensory data (at a lower level in the aggregation tree), it can be easily detected since this introduces a large deviation from the original data. Therefore, a more serious concern is the case where an aggregator close to the root of the aggregation tree is malicious or compromised.
In this paper, our goal is to design a reliable and efficient data aggregation scheme in terms of energy consumption, propagation delay, and accuracy of the aggregated result by following these design considerations.

Data Aggregation Scheme to Enforce Data Privacy and Data Integrity
In this section, we present a novel Hilbert-curve based data aggregation scheme that supports both privacy preservation and integrity protection for wireless sensor networks. In order to support data privacy, we first provide a data privacy preserving algorithm by using sensor nodes' seeds and Hilbertcurve values. A seed exchange algorithm is applied to reduce the number of messages during data aggregation. In order to support data integrity, we provide a private information retrieval (PIR) [16][17][18][19] based integrity checking algorithm that communicates between a child node and its parent node by exchanging a PIR message and its response message.

Privacy Preserving Algorithm.
For wireless sensor networks, we provide a novel privacy preserving algorithm by using a Hilbert-curve technique [20] and seed exchanges among sensor nodes. Our privacy preserving algorithm is performed through three phases: a network construction phase, a data encryption phase, and a data transmission phase. In the network construction phase, each node determines its sibling nodes, parent node, and child nodes by sending broadcast messages. Each node exchanges a seed to other nodes among its sibling nodes. In the data encryption phase, each node changes the sensed data into a value by using its generated seed and the received seeds. The changed value is encrypted by the Hilbert-curve algorithm. Finally, in the data transmission phase, each sensor node sends the aggregated data to a parent node where all the data from child nodes are merged with its encrypted data. A sink node aggregates all data of sensor nodes in the network. We explain each step in detail in the following.

Network Construction Phase.
Our privacy preserving algorithm chooses a tree-based topology to perform intermediate aggregations. Note that we do not use a clustering-based topology because it is affected by the communication range between cluster heads and it suffers from a large amount of messages for constructing network. First, a sink node triggers a query by sending a HELLO message generated from a message flooding scheme [21], as shown in Figure 1(a). Upon receiving the HELLO message, a sensor node determines whether the HELLO message is from the sink node or not. If a sink node is located within its communication range, the sensor node receives the HELLO message from the sink node and sets the sink node as a parent node. Otherwise, the sensor node waits for a certain period of time to receive the HELLO message from its sibling nodes and then selects one of the sibling nodes as a parent node by broadcasting a JOIN message. The sink node forwards the HELLO message to its sibling nodes with its corresponding level (Figure 1(b)).
In this procedure, we set the maximum number of child nodes so as to avoid network imbalance. If the network has imbalance, the sensor node of the imbalanced area may consume more energy than the other areas. Therefore, we define the maximum number of child nodes as given below. Definition 1. Let the Error Rate be the average rate of message loss from a sensor node, and let a weight ( ) be a value for the density of a sensor network. The maximum number of child nodes is defined by the following equation, where Network Area is the size of the network and Communication Range is the communication boundary reachable from a sensor node MIN (# of neighbors, (1) Figure 1 shows an example of our network construction algorithm and Algorithm 1 describes it. In algorithm 1, first, a sink node floods a HELLO message to the nearest node within its communication range (lines 1∼2). The node that receives the HELLO message from the sink node sets its own level and broadcasts the HELLO message to other nodes (lines 3∼12). If a node receives a JOIN message, it sets the node sending the JOIN message as a parent node. A parent node with the maximum number of child nodes sends to the child nodes a RESET message informing that they are allowed to link another node as a parent (lines 13∼18).

Data Encryption Phase.
After constructing a sensor network, each node generates random seed data for seed exchange. For this, we utilize an elliptic-curve key exchange Set parentID, recHopCnt, recLevel from message; NetInfo.curEntry++; (8) If (curHopCnt > recHopCnt + 1)curHopCnt = recHopCnt + 1; (9) else break; (10) If (TOS LOCAL ADDRESS is not leaf node) (11) Flooding(currentLevel, currentNodeID); (12) If (msgTypeis JOIN) (13) If (parent node does not exceed the maximum number of child node) (14) NetInfo.Parent = parentID;  algorithm that exchanges its own data by using a public elliptic curve, an arbitrary point, and its secret constant key. Figure 2 shows the flow of the elliptic key exchange algorithm. First, a source node and its neighboring node (receiving node) set a private constant key, for example, pSender and pReceiver. Second, each node makes a result by multiplying an arbitrary point ( ) and the private constant key having a public elliptic curve. Third, each node transmits the result to the neighboring node. Finally, it calculates the seed data by multiplying with its private constant key. The seed data are the sum of -coordinate and -coordinate, because the elliptic curve is a 2-dimensional equation. Because the elliptic key exchange algorithm allows each node to communicate without unnecessary messages, its own data can be protected from an attacker during communication.
The seed is used for hiding the original data from an adversary. The principle underlying our seed exchange method is as follows. The original data can be changed by extracting some part of a seed value, which is sent to other nodes. Some part of the seed value is also added from another node. As a consequence, the sensed data can be hidden among seed exchange group members. The following equation shows the final sending value from each node for data aggregation, where is the number of seeds received from other nodes. Figure 3 shows a sensed data encryption result on each sensor node after exchanging a seed: processed value = original value − generated seed To process a user's query, a parent node aggregates its changed data and all data received from its child nodes. Next, the parent node transforms the aggregated result into two-dimensional encrypted data by using the Hilbert curve [15]. The Hilbert curve, which was proposed by . Peano, transforms -dimensional data into 1-dimensional data. The Hilbert curve is a continuous fractal space-filling curve that gives a mapping between 1D and 2D space to preserve locality. The coordinates of a point ( , ), that is, projected to the unit square can be changed into a distance value from the start point to this point. To adapt the Hilbert curve to our algorithm, we assume that each sensor node transforms the one-dimensional sensed value into two-dimensional data. Here, the one-dimensional value is the aggregated value after applying the seed exchange algorithm for each node group. The two-dimensional data are the coordinate of the aggregated value along with the Hilbert curve in 2 × 2 metrics. For this, we set as keys both the level and the direction of the Hilbert curve. We can encrypt the aggregated data using two-dimensional data ( , ) into a tuple of ⟨key( , ), , ⟩, where is a level and is the direction. For example, the aggregated value 4 of node 8 can be encrypted into ⟨key(Bottom, 2), 1, 1⟩ since its transformed value, level, and direction are (1, 1), 2, and Bottom, respectively in Figure 4. In a node 5, it receives encrypted data ⟨key( , 2), 1, 1⟩ and ⟨key( , 2), 3, 2⟩ from its child nodes 8 and 9, respectively. The encrypted data from child nodes should be changed into ⟨key( , 2), 2, 1⟩ and ⟨key( , 2), 2, 0⟩ by following the curve direction and the level of the node 5. Then, node 5 aggregates their data and sends aggregated data ⟨key( , 2), 3, 2⟩ to the parent node. Algorithm 2 describes our data encryption algorithm. First, each node generates a Hilbert curve direction and a level based on the data (lines 1∼2). Next, each node encrypts the data by the Hilbert curve (line 3). Finally, each node packs the encrypted data for sending the aggregated data to its parent node (line 4).

Data Transmission Phase.
In the data transmission phase, each node sends the encrypted data to its parent node. The parent node then analyzes the encrypted data (e.g., key, curve direction, and curve level), that is received from the child node. If the curve direction and level of its child node are different from its own curve direction and level, the node should transform the received value based on its curve direction and level. In this way, a sink node aggregates all of the encrypted data from the hierarchy of nodes. To avoid communication loss of wireless sensor networks, we utilize a Time Division Multiple Access (TDMA) method [22] for data transmission. Definition 2 explains the principle to decide the start time of data transmission. Each child node sends the encrypted data at its own transmission time. Algorithm 3 shows our data aggregation algorithm. We start data aggregation from leaf Node (lines 1∼2). For aggregation, an intermediate node (InternalNode) can receive the data from its child node and reencrypt the data with its own data (lines 3∼11). In this way, all encrypted data of sensor nodes reach a sink node. Finally, the sink node sends the aggregated data to the service client (lines 12∼15).  If (the node is InternalNode){ (6) StoresencDatafrom msg; (7) decryptedData = decryption(encData)' (8) aggregatedData += decryptedData; (9) newEncData = HilbertCurve(direction, curveLevel, aggregatedData); (10) If (all data is received from childNode) (11) SendMessage(encData);to ParentNode; (12) }If (a node is SinkNode) (13) StoreencData from msg; (14) decryptedData = decryption(encData); (15) Send Message(decryptedData) to User;}} End Algorithm Algorithm 3: Data aggregation algorithm. (key( , 2), 1, 1) (key( , 2), 2, 1) (key( , 2), 2, 0)

Integrity Checking Algorithm.
Our integrity checking algorithm is performed through three phases: a PIR message construction phase, a PIR response phase, and an integrity checking phase. In the PIR message construction phase, upon receiving the encrypted data from a child node, the parent node constructs a PIR message and sends the message to the child node to check data integrity. In the PIR response phase, a child node responds with a result message by calculating row values based on the PIR message received from its parent node. Finally, in the data integrity checking phase, the parent node checks whether the data from its child node are valid by comparing two values, that is, the first received value and the second value.

PIR Message Construction Phase.
A parent node generates a PIR message to verify the value processed from its child node. The PIR technique was proposed to guarantee the exact result without revealing a client's desired information [18]. For this, it partitions the whole data space into a regular grid of × cells. Hence, a client performs modular computation where the desired cell is set to be Quadratic Nonresidues (QNR) and the other cells are set to be Quadratic

command PIR Message (int receieved data, int child ID, int PIR p, int PIR q)
(1) PIR init data = choose initial value(receieved data, ); (2) HCx = choose HCx coord(HC dir, receieved data, PIR init data); Residues (QR). A server then encrypts the dataset through a large number of computations and the user computes the result with the area of QNR. A set of QR and QNR is calculated by using Definition 3. Here, * is a set of disjoint integers from .

Definition 3. A set of QR and QNR.
Let = 1 * 2, 1, and 2 large prime numbers: * = ∈ | gcd ( , ) = 1, However, the PIR technique is not suitable for sensor networks because its communication cost is very high while sending the whole domain partitions. To adapt the PIR technique to our algorithm, it is necessary to downsize the × domain index to × (1 < ≤ and 1 < ≤ ) so that the PIR technique can be applicable to a wireless sensor environment. For this, we first compute based on the available message size in a sensor network. For example, if the maximum size of one message in a sensor network is 23 bytes, candidate values for are 2, 3, and 4 owing to 2 ≤ 23. Because we use the Hilbert curve technique for our privacy preserving algorithm, we can select 2 or 4 for . If we select 4 for , we can set the basic range of the value, that is sent from 0 to 15. Second, the parent node extracts a value from the PIR message being processed from its child node, and thus the modified value can range between 0 and 2 − 1. The value is transformed into ( ) by using a data transformation function that is randomly selected in the given function pool. Because the ID of ( ) is encrypted by using the Hilbert curve technique, it is difficult to obtain the value . The value is encrypted by transforming it into two-dimensional data using the Hilbert curve technique. Third, the parent node sets two large prime numbers and computes a set of QR and QNR. Finally, a cell whose Hilbert ID is the same as the modified value is set to be QNR and the others are set to be QR. Table 2 shows our PIR message structure.
Algorithm 4 shows our PIR message construction algorithm. First, a parent node randomly selects a subtracted value and calculates a modified value (line 1). Second, the node converts into two-dimensional data (line 2). Third, the parent node selects large prime numbers and in order to obtain the set of QR and QNR. A cell whose Hilbert ID is the same as the modified value is set to be QNR and the others are set to be QR (lines [3][4][5][6][7][8][9]. Finally, the node sends and the group of the QR and QNR values (line 10).

PIR Response Phase.
In the PIR response phase, a child node makes a response message by using both its processed data and the PIR message from its parent node. First, the child node finds a Hilbert value that is the same as the modified value by subtracting from the original data. A PIR response message consists of values that represent number of rows in × grid cells. Because the value of each grid cell is 0 or 1 in two-dimensional grid cells ( × ), the PIR response message can be expressed by -bit data. Definition 4 shows how to generate a response value for each column.
Definition 4. Assume that the data set of row is 1 , 2 ,. . ., and the data set of column is 1 , 2 ,. . ., ; the rule of generating the value of column j is as follows: The representative value of each cell can be calculated by using (6)  (6) Algorithm 5 shows our PIR response message construction phase. First, a child node extracts x from its processed value (line 1). Second, the child node finds the Hilbert ID of the result (lines 2-3). Third, the child node generates 2 data based on Definition 3 (line 4). It then constructs the PIR response message by using (6) (lines 5-12). Finally, the child node sends the PIR response message to its parent node (lines [13][14].

Integrity Checking Phase.
In the integrity checking phase, a parent node analyzes the PIR response message and determines whether the received data from its child node is valid. The parent node checks the QR and QNR of the received data by using the selected two prime numbers (in the second phase) and Jacobi symbol. If the received data are valid, there exist − 1 QRs and one QNR. Otherwise, the received data are not valid. Algorithm 6 shows our integrity checking algorithm for the received data. First, a parent node finds the QR and QNR for all columns (lines 1-2). Second, if QNR is set to the column of the modified value, the parent node assures that the processed data from its child node are valid. Finally, the parent node also checks the validity for QR (lines 3-8).

Example.
To protect both data privacy and data integrity, our scheme performs six phases: network construction phase, data encryption phase, PIR construction phase, PIR response phase, data integrity checking phase, and data transmission phase. In the network construction phase, each node sets the information of its sibling nodes, parent node, and child nodes. In Figure 5(a), a sink node A triggers a query by a HELLO message. Upon receiving the HELLO message, sensor nodes and determine whether the HELLO message is from the sink node. When and receive the HELLO message from , they set sink node as its parent node. And other sensor nodes, that is, , , , , and , wait for a certain period of time to receive a HELLO message from its neighbors. Upon receiving the HELLO message from any node, the node selects one of the neighboring nodes as its parent node by broadcasting a JOIN message. Figure 5(b) shows the constructed sensor network. After constructing the network, each node exchanges a seed with one node among its neighboring nodes located within its communication boundary. Figure 3 shows the process of the seed exchange. Each node changes sensed data by using the generated seed and the received seeds. All sensor nodes calculate the seed for aggregation, that is, = −3, = 5, = 1, = −3, = −3, = 1, = 0, = 2.
In the data encryption phase, the changed value is encrypted by a Hilbert curve algorithm to send the sensed data (or aggregated data) to the parent node. By selecting the direction and the level of the Hilbert curve, we can encrypt it as a tuple of ⟨key( , ), , ⟩ by using two-dimensional data ( , ), the level , and the direction . For example, in case of 14, we can encrypt it as ⟨key (Bottom, 2), 2, 1⟩ because its transformed value, level, and direction are (2, 1), 2, and Bottom, respectively. In the PIR message construction phase, a parent node constructs a message by using Definition 4 and sends the message to a child node for checking data   integrity. In the PIR sending phase, a parent node constructs a message with numbers, that is, one QNR and − 1 QR. For example, when the size of row, , and and are 4, 3, and 7, respectively, the set of QR is 1, 2, 4, 5, 6, and 8 and QNR includes others except QR. If a node B receives a value 6 from its child node C, B sets the column of the cell (1, 3) as the value of QNR while setting other columns as values of QR. By values for one QNR and three QR, node B sends a set (1,8,4,16) to its child node to check the validity of the received data. In the PIR response phase, a child node sends to its parent node values calculated by using (6). For example, by the received data 1, 8, 4, and 16, a child node calculates four numbers, that is, 2 * ((1 * 8 * 16)%21), 16, 4, and 4, as shown in Figure 6(b). For this, a cell being sent to its parent is represented by one while other cells are randomly chosen as zero or one by using Definition 2. If an adversary pollutes the original data, the PIR response phase can determine whether or not the original data are polluted during this processing. Because the adversary cannot know what column belongs to QNR, he/she cannot discover appropriate PIR values.
In the data integrity checking phase, a parent node determines whether the data received from its child node are valid by comparing the first received value with the second value. For example, for the received data 1 , 2 , 3 , 4 , we determine whether the value, that is computed by using Jacobi symbol is one or not. If the computed value is one, its cell value is zero. That is, 4 can be calculated as 4 (3) = 2 (3−1)/2 /3 = 2/3 = −1 (3 2 −1)/8 = −1, and hence the value of cell (1,3) is 1. Meanwhile, 3 can be calculated as 3 (3) = 16 (3−1)/2 /3 = 4 2 /3 = 1 and 3 (7) = 4 (7−1)/3 /7 = 16 2 /7 = 4 2 /7 = 1, and thus the value of cell (1, 2) is 0. If all values are valid, a parent node aggregates all the data received from its child nodes to process a user's query. Finally, in the data transmission phase, each sensor node sends to its parent node the encrypted data that are aggregated from its child nodes. For managing a sensor network, a sink node aggregates the data received from all the sensor nodes in the network.

Performance Analysis
In this section, we present performance results of both our scheme and existing schemes, in terms of communication overhead, data propagation delay, and integrity checking. For the experiment, we use a TOSSIM simulator [23] running on a TinyOS operating system [22] and a GCC compiler. We make use of 100 sensor nodes that are randomly distributed in a 100 m × 100 m area. As presented in directed diffusion, we use a receiving power dissipation of 395 mW and transmitting power dissipation of 660 mW. Table 1 shows our environment for implementation and Figure 7 shows three types of sensor node distributions for the experiment.

Experimental Results with Data Privacy Preserving
Schemes. We compare our Hilbert-curve based data aggregation scheme (HDA) with CPDA, SMART, Twin-Key, and GP 2 S, in terms of the number of transmission messages and the average lifetime of the sensor nodes. Here, the number of sensor nodes ranges from 10 to 100. Figure 8 shows the communication overhead with respect to a varying number of sensor nodes. The number of transmission messages in all schemes is increased as the number of sensor nodes increases. This is because when the number of sensor nodes is large, every sensor node in the WSN is capable of sensing data and hence a large number of messages should be transmitted. However, our scheme outperforms the existing schemes by about 10%-20%. The reason for this is that our scheme does not need to generate unnecessary messages during data aggregation since each sensor node can transform only its own data whereas the existing schemes require an additional message for privacy preservation. Figure 9 shows the number of transmission messages with respect to different distributions of sensor nodes. Figure 10 shows the number of transmission messages with a varying communication boundary when the number of sensor nodes is 100. In both figures, our scheme outperforms the existing schemes because it does not require unnecessary messages in all the cases. In particular, our scheme, SMART, and GP 2 S show consistent performance regardless of the type of distributions and the communication boundary. This is because they are less affected by the placement of sensor nodes owing to the use of a tree topology. Meanwhile, CPDA and Twin-Key are strongly influenced by both the type of distributions and the communication boundary, because they make use of a clustering method. Figure 11 shows the average lifetime of the sensor network with varying number of sensor nodes in the WSN. In this analysis, we measure the time until the number of sensor nodes, whose energy is completely consumed, is greater than 50% of all sensor nodes. The lifetime of all the schemes decreases as the number of sensor nodes increases. This is because the number of messages generated in the network is proportional to the number of messages required for data aggregation. However, the lifetime of our scheme becomes 100%∼125% longer than those of all the existing schemes,  because our scheme can reduce unnecessary messages during data aggregation.

Experimental Results with Data Integrity Schemes.
We compare our data integrity validation HDA scheme (iHDA) with iCPDA and iPDA in terms of the number of transmission messages per query round, the average lifetime of sensor nodes, and the attendance ratio of sensor nodes. Figure 12 shows the communication overhead with respect to varying number of sensor nodes in a WSN. The number of transmission messages for iPDA, iCPDA, and our scheme is increased as the number of sensor nodes increases. Our scheme outperforms the iPDA and iCPDA schemes because the existing schemes generate unnecessary messages during data aggregation in the network. That is, each sensor node generates only two additional messages for privacy preservation and integrity checking in our scheme whereas the iPDA and iCPDA schemes generate six and four messages, respectively. Due to numerous messages exchanges among sensor nodes, there is a high rate of data collisions in the existing schemes. Therefore, the iPDA and iCPDA schemes are very expensive in terms of communication overhead because the number of messages generated in the network is very large for successful data transmission. Figure 13 shows the average lifetime with respect to varying number of sensor nodes in the WSN. The dissipated energy for all three schemes is increased as the number of sensor nodes increases. This is because every message generated in the network requires energy to reach the sink node. However, in terms of lifetime, our scheme shows 35∼130% better performance than iPDA and iCPDA schemes. The reason is that the iPDA and iCPDA schemes generate too many unnecessary messages for data aggregation to enforce both integrity protection and privacy preservation. In the existing schemes, every sensor node becomes active to send its messages for a longer time. Figure 14 shows the attendance ratio of sensor nodes for data aggregation. During data aggregation, a sensor node sends the sensed data (or aggregated data) to its parent node. The attendance ratio of sensor nodes in our scheme is about 100% whereas both iPDA and iCPDA have some sensor nodes that do not take part in data aggregation. Because a given sensor node in the iPDA and iCPDA schemes has to communicate with at least six and two neighboring nodes, respectively, some sensor nodes cannot participate  in data aggregation. Therefore, our scheme shows the best performance among the three schemes.

Conclusion and Future Work
Recently, as advanced technologies of mobile devices and wireless communication proliferate, wireless sensor networks (WSNs) have increasingly attracted interest from various applications including military and environmental monitoring. Moreover, since sensor nodes have limited resources, such as battery and memory capacity, many data aggregation techniques have been proposed for WSNs. However, the wireless communication can be easily overheard, and thus the provision of a data aggregation scheme to support data privacy is a challenging issue in WSNs. Although several data aggregation schemes have been proposed to preserve data privacy, they have the following limitations. First, the communication cost for network construction and data aggregation is considerably expensive. Second, only a part of the existing methods supports data privacy. In addition, it is necessary to assure that the aggregated data are not polluted by an unauthorized third party. For this, we propose a new data aggregation scheme for enforcing both data privacy and data integrity in WSNs. Our scheme makes use of a seed exchanging algorithm to reduce the communication cost for preserving data privacy. It also utilizes an integrity checking algorithm based on a private information retrieval (PIR) technique. From our performance analysis, we show that our HDA scheme achieves 100%-300% longer network lifetime and about a 10% better attendance rate for the aggregated data than the existing privacy preserving schemes. In addition, our iHDA scheme achieves 40%-160% better performance in terms of network lifetime and about a 16% better participation rate for the aggregated data. As future work, we plan to verify that our scheme is efficient in WSNs by applying it to a real environment.