Aggregate Queries in Wireless Sensor Networks

Recently as e ﬃ cient processing of aggregate queries for fetching desired data from sensors has been recognized as a crucial part, in-network aggregate query processing techniques are studied intensively in wireless sensor networks. Existing representative in-network aggregate query processing techniques propose routing algorithms and data structures for processing aggregate queries. However, these aggregate query processing techniques have problems such as high energy consumption in sensor nodes, low accuracy of query processing results, and long query processing time. In order to solve these problems and to enhance the e ﬃ ciency of aggregate query processing in wireless sensor networks, this paper proposes Bucket-based Parallel Aggregation (BPA). BPA divides a query region into several cells according to the distribution of sensor nodes and builds a quadtree, and then processes aggregate queries in parallel for each cell region according to routing. It sends data in duplicate by removing redundant data, which, in turn, enhances the accuracy of query processing results. Also, BPA uses a bucket-based data structure in aggregate query processing, and divides and conquers the bucket data structure adaptively according to the number of data in the bucket. In addition, BPA compresses data in order to reduce the size of data in the bucket and performs data transmission ﬁltering when each sensor node sends data. Finally, in this paper, we prove its superiority through various experiments using sensor data.


Introduction
With the rapid advance of sensing technologies for capturing various types of data such as temperature, humidity, and pressure as well as the development of wireless communication technologies, research is being made actively for utilizing wireless sensor network technologies in diverse application areas including military, medicine, meteorology, environment, transportation, home, and business [1,2].
Generally sensor nodes do not use unicasting (use ACK) whsen they regularly send the sensed data. Rather, they multicast or broadcast to the sensor nodes within the scope of the communication [3]. Also, sensor nodes basically know the content of the query that S-node (the starting node) sent, effective time of the query (from the reception of the query to the transmission of the first sensed data), and the cycle of the query (interval to transmit the sensed data).
In particular, the aggregate query process, which is to obtain aggregate results from data collected by sensors, is recognized as an important research area [4]. Aggregate queries execute functions such as MAX, MIN, SUM, AVG, COUNT, MEDIAN, and HISTOGRAM on the entire wireless sensor network or a specific region of the network.
Conventional centered aggregate query processing techniques have the problem of high energy consumption by the sensor nodes. Thus, in order to reduce the energy consumption of sensor nodes, aggregate query processing in network is being studied actively, which processes aggregate queries on sensed data in the sensor nodes and then sends the results to the server [5][6][7]. Representative techniques of aggregate query processing in network include TAG (Tiny AGgregation) and IWQE (Itinerary-based Window Query Execution) that focus on routing algorithm, and q-digest (quantile digest) and SMC (Secure Median Computation)  that focus on data structure [8][9][10][11]. TAG is an aggregate query processing technique using hierarchical routing [8] and IWQE is an aggregate query processing technique using itinerary routing [11]. Although TAG, and IWQE propose routing algorithm for efficient aggregate query processing, they have problems such as high energy consumption by the sensor nodes, low accuracy of query processing results, and long query processing time. Moreover, TAG, and IWQE have the shortcoming that they cannot consider aggregate operations MEDIAN and HISTOGRAM.
Q-digest is an approximate aggregate query processing technique using tree data structure for aggregate operations MEDIAN and HISTOGRAM [10,12], and SMC is an approximate aggregate query processing technique using bucket data structure for aggregate operations MEDIAN and HISTOGRAM [9]. In this way, q-digest, and SMC suggest data structures for efficient aggregate query processing but they still have problems such as high energy consumption by the sensor nodes, low accuracy of query processing results, and long query processing time. Moreover, q-digest, and SMC do not consider composite types of aggregate queries (executing two or more aggregate queries at the same time). Ordered sample · · · · · · b buckets In order to solve these problems in existing aggregate query processing techniques and to enhance the efficiency of aggregate query processing in wireless sensor networks, this study proposed aggregate query processing technique BPA (bucket-based parallel aggregation). BPA collects information on sensor nodes within a query region, divides the query region into multiple cells according to the distribution of sensor nodes, builds a quad-tree using the cells, and processes an aggregate query in parallel according to itinerary routing for the cell coverage of the quad-tree nodes. Because BPA processes an aggregate query in parallel, the sensor nodes consume less energy and query processing time is short even if the query region is wide or the number of sensor nodes is large.
Moreover, BPA minimizes the number of missing nodes, which cannot participate in aggregate query processing, among the sensor nodes of the query region, and on the occurrence of a missing node, it sends data to the closest node to the sensor node that started the query. Moreover, the sensor nodes of BPA send data double, to reduce data loss resulting from transmission errors. As it minimizes missing nodes and sends data double after removing redundant data, BPA can enhance the accuracy of query processing results.
BPA uses bucket-based data structure for aggregate operations MEDIAN and HISTOGRAM. The bucket data structure stores the minimum and maximum values of collected data, the mean value of data in the bucket, and the International Journal of Distributed Sensor Networks 3 MBR R-node Center of query region Query region number of data in the bucket in consideration of composite types of aggregate queries. Moreover, it compresses data using the variable bit compression-coding technique [13] in order to reduce the size of data in the bucket. Using the bucket data structure and the variable bit compression coding technique, BPA can reduce the energy consumption of sensor nodes.
Because BPA divides and merges the bucket data structure adaptively according to the number of data in the bucket, it can enhance the accuracy of query-processing results even if data distribution is uneven. Moreover, BPA performs data transmission filtering, which sets a filtering range in each sensor node and sends data only when the data are outside the range, and this reduces the energy consumption of sensor nodes.

TAG (Tiny AGgregation).
TAG is a technique of aggregate query processing in network that uses hierarchical routing for aggregate query processing [8]. That is, TAG establishes hierarchical routing for the entire wireless sensor network in order to process aggregate queries in the network. Figure 1 shows the hierarchical routing structure of TAG.
As in Figure 1, TAG establishes hierarchical routing by defining parent-child relations among all the sensor nodes. A child sensor node in the query region sends sensed data to its parent sensor node, which sends intermediate aggregate query results to its parent sensor node. At last, the sink node returns the final results of aggregate query processing to the server.
TAG reduces overall energy consumption by sensor nodes through processing aggregate queries within the wireless sensor network instead of centralized processing [5,11]. Thus, TAG is efficient when the query region is wide or the number of sensor nodes is large. However, there is high energy consumption by sensor nodes not included in the query region, and a large amount of energy is consumed by sensor nodes in order to maintain routing [3,11]. In addition, TAG has the shortcoming that aggregate operations MEDIAN and HISTOGRAM are not considered [9,10,12].

IWQE (Itinerary-Based Window Query Execution).
IWQE is a technique of aggregate query processing in network that uses itinerary routing for aggregate query processing [11]. IWQE processes aggregate queries by establishing temporary routing for the region of interest when a user query is given instead of establishing routing for the entire region of wireless sensor network. Figure 2 shows the itinerary routing structure of IWQE.
As in Figure 2, IWQE processes an aggregate query for data sensed by sensor nodes within the query region using itinerary routing, and the sink node returns the final result of the query to the server.
In IWQE, there is no unnecessary energy consumption by sensor nodes not included in the query region and no cost of routing maintenance. Furthermore, it is efficient when the query region is narrow or the number of sensor nodes is small [5]. However, the accuracy of query results is low due to the occurrence of missing nodes, and redundant data arising from broadcasting transmission for reducing data transmission errors impairs the accuracy of query processing results and increases energy consumption by the sensor nodes. Moreover, it takes a long time to process queries if the number of sensor nodes is large or the query region is wide [3]. Moreover, IWQE does not consider aggregate operations MEDIAN and HISTOGRAM [9,12].

Q-digest (Quantile-Digest).
Q-digest is an approximate aggregate query processing technique using tree data structure in order to process aggregate operations MEDIAM and HISTOGRAM [10,12]. The tree data structure of q-digest has characteristics as follows. The root node has a range value of [1, σ], and its child nodes have range values of [1, σ/2] and [σ/2+1, σ], respectively. In addition, each node stores the number of sensing data included in the range. Figure 3 is the tree data structure of q-digest and an example. As in Figure 3, it is assumed that the whole range of sensing data is 1-8, and number of data is 15, and compression rate k is 5. Accordingly, root node g has a range value of (1,8), and its child nodes e and f have range values of (1, 4) and (5,8), respectively. In addition, child nodes a, b, c, and d have range values of (1, 2), (3,4) and (5, 6) and (7,8), respectively. Moreover, each node stores the number of sensed data that belong to its range. In particular, each node of q-digest compresses data in case the sum of the number of data in the node and the number of data in its parent node and its neighbor node is smaller than compression rate n/k.
As q-digest processes aggregate queries using tree data structure, it reduces the amount of data transmission and, consequently, shows high performance in aggregate operations MEDIAN and HISTOGRAM. However, the energy consumption of sensor nodes is high due to additional information for building the tree data structure, and data compression lowers the accuracy of aggregate query results. Moreover, because the range values of nodes are fixed, if sensed data are not distributed evenly, the results of aggregate Center of query region CR 1 Figure 6: Example of quad-tree structure. query processing become less accurate [9,14]. What is more, q-digest does not consider composite types of aggregate queries.

SMC (Secure Median Computation)
. SMC is an approximate aggregate query processing technique that uses bucket data structure for aggregate operations MEDIAM and HISTOGRAM [9]. The bucket data structure of SMC has characteristics as follows. SMC forms b buckets to store sensed data, and each bucket B i has range value q i -q i+1 and stores the number of sensed data. In particular, SMC divides or merges buckets in which specific aggregate values are stored and, by doing so, it enhances the accuracy of aggregate query processing results in the next query. Figure 4 is the bucket data structure of SMC and an example. As in Figure 4, the bucket data structure has the entire data range of 0-50, and consists of 5 buckets. Bucket a has range value 0-10, b 11-20, c 21-30, d 31-40, and e 41-50, and the buckets store the numbers of sensed data 3, 1, 8, 4 and 7, respectively. SMC processes aggregate queries using bucket data structure and, by doing so, it reduces the amount of data transmission and shows high performance in aggregate operations MEDIAN and HISTOGRAM. However, the energy consumption of sensor nodes is high due to the transmission of fixed bucket data structure, and the accuracy of aggregate International Journal of Distributed Sensor Networks  query processing results is low due to the merger of bucket data [14]. Moreover, SMC does not consider composite types of aggregate queries.

BPA (Bucket-Based Parallel Aggregation)
3.1. Routing. BPA establishes hierarchical routing and collects sensor node information in order to reduce energy consumption by sensor nodes and query processing time. Then, using collected sensor node information, it divides the query region into a number of cells according to the distribution of sensor nodes, builds a quad-tree with the cells, and processes an aggregate query in parallel on the cell coverage of the quad-tree through the itinerary routing. Figure 5 shows the hierarchical routing structure and an example of MBR structure for collecting sensor node information.
As in Figure 5, the closest sensor node to the center of the query region is searched for, and the node is used as R-node (root node) of hierarchical routing to be established. Starting from R-node, a sensor node with child nodes defines MBR (minimum boundary rectangle) that includes itself and its child sensor nodes, collects information on the sensor nodes within the MBR, and sends the data to its parent sensor node.
Using the collected sensor node information, BPA divides the query region into a number of cells and builds a quadtree with the cells. Figure 6 shows an example of quad-tree structure.
As in Figure 5, the closest sensor node to the center of the query region is searched for, and the node is used as R-node (root node) of hierarchical routing.
As in Figure 6, query region QR is divided into CR 1 , CR 2 , CR 3 and CR 4 , and CR 4 is again subdivided into CR 5 , CR 6 , CR 7, and CR 8 , and again CR 8 into CR 9 , CR 10 , CR 11, and CR 12 . Each cell is subdivided until the number of sensor nodes in the cell becomes smaller than the threshold maximum number of sensor nodes in each cell.
BPA selects C-node, which is the representative sensor node, within each quad-tree cell and processes an aggregate query in parallel for the coverage of each cell in the quadtree. The result of aggregate query processing for the coverage of each cell is transmitted recursively to the representative   sensor node of the parent node cell. Figure 7 shows the recursive process transmitting the result of aggregate query processing to the representative sensor node of the parent node cell.
As in Figure 7, the results of aggregate query processing in C 10 -node, C 11 -node, and C 12 -node are transmitted to C 8node, and the result of aggregate query processing in C 8node is transmitted to C 9 -node. In addition, the results of aggregate query processing in C 6 -node, C 7 -node, and C 9node are sent to C 4 -node, and the result of aggregate query processing in C 4 -node is sent to C 5 -node. Lastly, the results of aggregate query processing in C 1 -node, C 2 -node, C 3 -node, and C 5 -node are transmitted to R-node, and the result of aggregate query processing in R-node is returned to S-node, the sensor node that started the query.
BPA uses itinerary routing in order to process aggregate queries in quad-tree cells. Figure 8 shows an example of routing process in quad-tree cells.
As in Figure 8, Q-node, which is the query transmission sensor ( √ 3/2)R node within each cell, collects data from Dnodes, which are data transmission sensor nodes within the communication range, through the ideal itinerary routing, processes an aggregate query, and sends the result to the next Q-node. At that time, the actual routing path of Q-nodes is the real itinerary routing and each itinerary routing interval W is set as using sensor nodes' communication range R.
In order to minimize the number of missing nodes not participating in aggregate query processing in the itinerary routing process, BPA selects the closest sensor node to Ideal Q-node among D-nodes within the communication range of Q-node as the next Q-node. Ideal Q-node is a virtual sensor node that does not exist but is set for optimal routing, and means the intersecting point between the line of the ideal itinerary routing and the communication range of Q-node.
Sensor nodes basically know the content of the query that S-node (the starting node) sent, effective time of the query (from the reception of the query to the transmission of the first sensed data), and the cycle of the query (interval to transmit the sensed data). Therefore if it were not selected as Q-node or D-node within the effective time of the query, it would perceive it as a missing node.
If a missing node occurs, it is processed as follows. If a sensor node finds itself to be a missing node because it has not been selected as a Q-node or a D-node within a query valid time, it sends its data to the closest sensor node to Snode. Figure 9 shows an example of Missing node process. As shown in Figure 9, a missing node calculates the distance using the location information of S-node and the sensor nodes within the communication scope and selects the nearest sensor node from S-node. D-nodes and Q-nodes in BPA perform double data transmission among sensor nodes in order to reduce errors in the results of aggregate query processing caused by network transmission errors. Figure 10 shows an example of double data transmission process by a D-node in BPA.
As in Figure 10, a D-node sends sensed data to the Qnode and its neighbor D -node, which is another D-node. That is, the D-node sends data to both the Q-node and Dnode, which is one of its neighbor sensor nodes, includes the Q-node within its communication range, and satisfies the right-hand rule. Then, D -node sends data from the D-node and data sensed by itself to the Q-node. Figure 11 shows an example of double data transmission process by a Q-node in BPA.
As in Figure 11, a Q-node sends data collected from Dnode to the next Q-node and its neighbor Q -node. That is, the Q-node sends data to both the next Q-node and Q -node, which is one of its neighbor sensor nodes, includes the next Q-node within its communication range, and is the closest to the next Q-node. Then, Q -node sends data from the Q-node and data sensed by itself to the next Q-node.
In BPA, when a sensor node sends data, it adds its ID and sends the data double, and the sensor node that has received Avalue n , Count n Figure 12: Bucket data structure of Q-node.   the data solves the data redundancy problem by removing data with a redundant ID.

Data
Structure. BPA uses bucket-based data structure and the variable bit compression coding technique in order to reduce energy consumption by sensor nodes in processing aggregate queries such as MEDIAN and HISTOGRAM. The data store structure of D-node consists of ID, which is the ID of the sensor node, and Value, which is sensed data. In addition, the data store structure of Q-node contains ID indicating the ID of the sensor node, MinValue and MaxValue indicating the minimum and maximum values of collected data, A Value i indicating the average value of data in the bucket, and Count i indicating the number of data in the bucket. Figure 12 shows the bucket data structure of Q-node.
As in Figure 12, the whole bucket list has size BLSize (BLMax − BLMin), and the initial size of each bucket is BMax, which is the maximum size of a bucket. In addition, the bucket list stores information on individual buckets such as A Value i , which is the mean value of data in bucket i, and Count i , which is the number of data in bucket i. A Value i is obtained by In (1), A Value i is the mean of data included in bucket i, Count i the number of data in bucket i, and the Value k i data in bucket i.
BPA uses the variable bit compression coding technique [13] for data compression. That is, BPA compresses Value in D-nodes and MinValue, MaxValue and A Value in Q-nodes, which occupy the largest part of data in D-nodes and Qnodes, and stores the compressed data in order to reduce the size of data in sensor node data transmission.
Moreover, BPA divides and merges bucket data structure adaptively according to the number of data in the bucket in order to enhance the accuracy of query processing results. Figures 13 and 14 show an example of bucket division and bucket merger when bucket list size BLSize is 60 (MinValue = 0, MaxValue = 60), maximum bucket size BMax is 10, minimum bucket size BMin is 2, and maximum number of data in the bucket BMaxC is 5.
As in Figure 13, when the number of data in bucket 4.1 exceeds the maximum number of data in the bucket (BMaxC = 5), it is divided into buckets 4.1.1 and 4.1.2, but because minimum bucket size BMin is 2, buckets 4.1.1 and 4.1.2 are not divided any longer.
As in Figure 14, because the sum of the data counts of two buckets 4.1.1 and 4.1.2 is less than the maximum number of data in the bucket (BMaxC = 5) and maximum bucket size BMax is 10, the two buckets merge into bucket 4.1.
In order to reduce the energy consumption of sensor nodes, BPA sets a filtering range for each sensor node and sends data only when the data are outside the filtering range.
D-nodes are allocated filtering range DF. As in (2), DF of a D-node is calculated using TF (total filtering range) and TSC (the number of sensor nodes in the query region). The D-node sends data only when the data are outside DF: Q-nodes are allocated initial filtering range IQF and reset filtering range QF in aggregate query processing. In addition, Q-nodes send data only when the sum of bucket lists is outside filtering range QF. As in (3), IQF of a Q-node is calculated using TF (total filtering range) and TSC (the number of sensor nodes in the query region) In aggregate query processing, a Q-node resets QF using IQF of the Q-node, DF of D-nodes, DSC (the number of D-nodes that have sent data to the Q-node within its communication range), and QF of the previous Q-node. If there is a previous Q-node and it has sent data, the Q-node resets QF with the sum of IQF of the Q-node, DFs of D-nodes that have sent data, and QF of the previous Q-node. This can be expressed as In case there is a previous Q-node but the previous Q-node has not sent data or in case there is no previous Q-node, QF of the Q-node is reset with the sum of IQF of the Q-node and DFs of D-nodes that have sent data. This can be expressed as in (5) Sensor nodes basically know the content of the query that S-node (the starting node) sent, effective time of the query (from the reception of the query to the transmission of the first sensed data), and the cycle of the query (interval to transmit the sensed data). Therefore, if data were not transmitted within the cycle of the query even when there existed Q-node or when there is no previous Q-node, it would deal with the query by setting QF according to (5). As in Figure 15, because DF of D-nodes is 1, D-node b, d, and e do not send data. However, D-node a sends DSD a (15), to Q-node h because its sensing data are outside filtering range DF a . In addition, because Q-node h does not have a previous Q-node, QF h is (1 + (1 × 1) = 2) by (5). In Q-node h, the sum of bucket list QPSD h is 65.5 and the sum of bucket list QSD h is 68.5, which are outside filtering range QF h , so Qnode h sends QSD h (5,21,5.25,2,13.5,2,31,1) to Q-node i.
As in Figure 16, because DF of D-nodes is 1, D-node g does not send data, but D-node c and f send DSD c (36.5) and DSD f (37), respectively, to Q-node i because their sensing data are outside filtering ranges DF c and DF f , respectively. In addition, because if there is a previous Q-node the previous Q-node sends data, QF i of Q-node i is (1 + (1 × 2) + 2 = 5) by (4). In Q-node i, the sum of bucket list QPSD i is 243.5 and the sum of bucket list QSD i is 245.9, which are within filtering range QF i , so Q-node i does not send data to Q-node j.

Algorithm.
The BPA algorithm consists of processes for generating itinerary routing, sending an aggregate query through the generated itinerary routing, and returning the results of the aggregate query. Algorithm 1 shows the whole of the BPA algorithm.
As in Algorithm 1, the BPA algorithm first selects R-node, the closest sensor node to the center of the query region, builds hierarchical routing based on R-node, and collects sensor node information. Then, it forms a quad-tree using the collected sensor node information, and creates itinerary routing by selecting representative sensor node C-node from each cell of the quad-tree.
In addition, the BPA algorithm processes an aggregate query through the itinerary routing, starting from each Cnode. That is, a Q-node sends an aggregate query through itinerary routing, and D-nodes return the results of the aggregate query. At that time, the Q-node and D-nodes determine whether data have been filtered or not, and if not filtered, they compress only changed bucket information and send it to the next node. In aggregate query processing, missing nodes, which are neither a Q-node nor a D-node, send query results to the closest node to S-node, the sensor node that started the query.

Performance Evaluation Environment.
The hardware specifications of the system used in performance evaluation were Intel Core 2.4 GHz CPU, 2 GB RAM, and 300 GB HDD, and its operating system was Windows XP. In addition, MFC (Microsoft Foundation Class Library) was used in simulation, and 13 parameters as in Table 1 were set in performance evaluation.
Particularly for BPA, TAG-q-digest, TAG-SMC, IWQEq-digest, and IWQE-SMC, we optimized the data structure of BPA, q-digest, and SMC in evaluating the performance of MEDIAN query processing in order to maintain the accuracy of query processing results over 95%. Figure 17 presents the results of performance evaluation in AVG query processing for BPA, TAG, and IWQE, showing the accuracy of results, energy consumption, and query processing time according to the number of sensor nodes.

Results of Performance Evaluation.
As in Figure 17, BPA showed 29% higher performance than IWQE and 18% higher than TAG in terms of the accuracy of query processing results. In terms of energy consumption, BPA showed 59% higher performance than TAG, and 37% higher than IWQE. Also in terms of query processing time, BPA showed 57% higher performance than IWQE and 28% higher than TAG.  As in Figure 18, BPA showed 101% higher performance than TAG-q-digest, 88% higher than TAG-SMC, 66% higher than IWQE-q-digest, and 55% higher than IWQE-SMC in terms of energy consumption. In terms of query processing time as well, BPA showed 93% higher performance than IWQE-q-digest, 87% higher than IWQE-SMC, 66% higher than TAG-q-digest, and 59% higher than TAG-SMC. Figure 19 presents the results of performance evaluation in AVG query processing for BPA, TAG, and IWQE, showing the accuracy of results, energy consumption, and query processing time according to the number of continuous queries.
As in Figure 19, BPA showed 28% higher performance than IWQE and 18% higher than TAG in terms of the accuracy of query processing results. In terms of energy consumption, BPA showed 66% higher performance than TAG, and 42% higher than IWQE. Also in terms of query processing time, BPA showed 56% higher performance than IWQE and 26% higher than TAG.    Figure 20 presents the results of performance evaluation in MEDIAN query processing for BPA, TAG-q-digest, TAG-SMC, IWQE-q-digest, and IWQE-SMC, showing energy consumption and query processing time according to the number of continuous queries.
As in Figure 20, BPA showed 104% higher performance than TAG-q-digest, 93% higher than TAG-SMC, 71% higher than IWQE-q-digest, and 59% higher than IWQE-SMC in terms of energy consumption. In terms of query processing time as well, BPA showed 90% higher performance than IWQE-q-digest, 80% higher than IWQE-SMC, 64% higher than TAG-q-digest, and 56% higher than TAG-SMC. Figure 21 presents the results of performance evaluation in AVG query processing for BPA, TAG, and IWQE, showing the accuracy of results, energy consumption, and query processing time according to the size of query region.
As in Figure 21, BPA showed 28% higher performance than IWQE and 18% higher than TAG in terms of the accuracy of query processing results. In terms of energy consumption, BPA showed 45% higher performance than TAG, and 31% higher than IWQE. Also in terms of query processing time, BPA showed 75% higher performance than IWQE and 23% higher than TAG. Figure 22 presents the results of performance evaluation in MEDIAN query processing for BPA, TAG-q-digest, TAG-SMC, IWQE-q-digest, and IWQE-SMC, showing energy consumption and query processing time according to the size of query region.
As in Figure 22, BPA showed 94% higher performance than TAG-q-digest, 84% higher than TAG-SMC, 68% higher than IWQE-q-digest, and 56% higher than IWQE-SMC in terms of energy consumption. In terms of query processing time as well, BPA showed 100% higher performance than IWQE-q-digest, 90% higher than IWQE-SMC, 66% higher than TAG-q-digest, and 54% higher than TAG-SMC.

Analysis of Performance Evaluation.
When performance in AVG query processing was evaluated according to the number of sensor nodes, the number of continuous queries, and the size of query region, BPA showed higher performance than TAG, and IWQE in terms of the accuracy of processing results, energy consumption, and query processing time. This is probably because BPA solves the problem in TAG that data of sensor nodes not included in the query region are transmitted, resolves the shortcoming of IWQE by reducing the number of missing nodes happening in the routing process, and processes an aggregate query in parallel by dividing the query region.
Moreover, when performance in MEDIAN query processing was evaluated according to the number of sensor nodes, the number of continuous queries, and the size of query region, BPA showed higher performance than TAG-qdigest, TAG-SMC, IWQE-q-digest, and IWQE-SMC in terms of energy consumption and query processing time. This is probably because BPA does not use a data structure with a fixed range as in q-digest, and SMC but updates the data structure adaptively, sends only changed bucket information instead of sending all aggregate data each time, and performs compressing and filtering for data to be transmitted.
Particularly in AVG and MEDIAN query processing, BPA showed even higher performance than the existing techniques when the number of sensor nodes and the number of continuous queries were large and when the size of query region was large.

Conclusions
This study proposed BPA, a bucket-based parallel aggregate query processing technique for more efficient aggregate query processing in wireless sensor networks. In order to reduce the energy consumption of sensor nodes and query processing time, BPA builds a query region into a quadtree and processes an aggregate query in parallel through the itinerary routing over the cell coverage of quad-tree nodes. In addition, it minimizes the occurrence of missing nodes for higher accuracy of query processing results and reduces data loss from transmission errors through the double data transmission by sensor nodes.
International Journal of Distributed Sensor Networks 15 BPA also uses bucket-based data structure and the variable bit compression coding technique in order to reduce energy consumption by sensor nodes in processing aggregate queries MEDIAN and HISTOGRAM. Particularly for higher accuracy of query processing results, it divides and merges the bucket data structure adaptively according to the number of data in the bucket. What is more, data are transmitted only when they are outside the filtering range, and this reduces the energy consumption of sensor nodes. Lastly, we proved the superiority of BPA proposed as an aggregate query processing technique through various experiments using sensor data.