Continuous Probabilistic Skyline Queries for Uncertain Moving Objects in Road Network

In moving environment, the positions of moving objects cannot be located accurately. Apart from the measuring instrument errors, movement of the objects is the main factor contributing to this uncertainty. This uncertainty makes dominant relationship of data instable, which will affect skyline operator. In this paper, we mainly study the continuous probabilistic skyline query for uncertain moving objects in road network. The query point is deemed to be stationary while moving objects are treated as targets with uncertainty described by a probability density function. After defining the notion of dominant probability and probabilistic skyline, we put forward a novel algorithm to deal with continuous probabilistic skyline query on road network. Firstly, we compute the dominant probability and skyline probability to get initial permanent p-skyline set. Then we define events to predict the time when dominant relationship between moving objects may change. Furthermore, we track and calculate events to update the probabilistic skyline in an incremental way. Two pruning strategies are proposed to cancel invalid events and objects in a bid to diminish search space. Finally, an extensive experimental evaluation on real datasets shows that probabilistic skyline sets in road network can be updated by the proposed algorithm. It demonstrates both efficiency and effectiveness.


Introduction
Skyline query aims to find a subset where all objects are not dominated by any other object in the dataset, helping users in multicriteria decision making, data mining and visualizing for database, and so forth.
In mobile database, a moving object reports its position and velocity to database service through wireless communication interface. Because of time delay and other technical limitations, the position obtained usually deviates from actual one. The deviation causes what is called uncertainty, which leads to instability of dominant relationship between objects. In real world, objects are restricted in road network, such as track, road, or highway. Position uncertainty of moving objects including its environment should be considered in skyline operation.
Until recently, a lot of work on skyline query had been focused on a static dataset, where the distances from query point to target objects are invariant. With rapid development of GPS technology and mobile devices, the location-based service (LBS) [1], which is pervasive in daily life, is becoming one of the most important applications in spatiotemporal database. However, in moving environment, moving objects whose position is detected by GPS system alternately are always in motion. Delay or error of the sensor in GPS, especially the self-motion of moving objects, makes the location of moving objects uncertain. Despite the existing uncertainty, the position is often treated as being accurate to be queried in many studies [2,3]. Study on continuous skyline query on moving objects was first proposed by Huang et al. [4]. In his work, position of moving objects is considered as the data with certainty.
In fact, because of the sensors' error and the objects' movement, the position of objects in moving environment is uncertain and vague. Recently, a few researchers have been aware of this imprecision and made some contributions [5][6][7]. In our approach, the location of query object is exactly known as a stationary object, but the location of moving target objects is characterized by a certain distribution instead of a precise point. For example, a weapon-related crime takes place somewhere, which is located in point , as Figure 1 shows. Police cars should be dispatched to intervene. The 2 International Journal of Distributed Sensor Networks adequate police strength including the number of policemen and equipment in the dispatched car is needed. Also a car as near to the crime scene as possible is another factor for consideration in order to arrive in time. In Figure 1, table shows the parameters of each police car, including position of the cars, number of policeman, equipment level, and network distance to the query point , where the value of equipment level is used to quantify the level of equipment, with greater value corresponding to higher level. If the police cars are treated as certain points, cars and are skyline objects. However, the cars are moving quickly, so their positions are uncertain to some extent. For example, suppose that the true position of car is at point , where the distance from to is 130 instead of 125, and car 's actual position is at point whose distance to is 125. In this case, car can dominate car , so car is not skyline object. Because of road network limitation, the uncertain area is represented as a line centered on its acquired location, as Figure 1 shows. In order to simplify this scene, probability density function ( ) assumes uniform distribution and the police car is possible to be skyline object.
In this paper, we address the problem of continuous probabilistic skyline query for uncertain moving objects in road network. In our study case, the query object stays still, while the target objects are moving with uncertainty which is characterized by both a closed uncertainty region and a probability density function (pdf). The main contributions are as follows.
(1) Probabilistic skyline on uncertain moving object in road network is introduced as a new important issue in decision support or navigation systems.
(2) By analyzing the dominant relationship between moving objects in uncertain position environment, the dominant probability and skyline probability are defined in case the query is stationary with targets moving.
(3) Based on trigger events which are presented to track the change of dominant relationship, a novel algorithm Probabilistic Skyline query with Uncertainty in Road network (PSUR) is proposed to deal with the dynamic skyline query with uncertainty. A series of pruning strategies are introduced to optimize and fasten this incremental algorithm.
(4) A great deal of experiments on two truth datasets are conducted to analyze the uncertain region, numbers of static dimensions, velocity of moving objects, and dataset size that affect the algorithm. Contrast experiments from literature [8] are made to validate the proposed PSUR. The experimental results show that PSUR is effective and efficient.
The rest of this paper is organized as follows. In Section 2, we summarize the related work of skyline computation. In Section 3, after describing basic notion of skyline and uncertain model for moving objects, we define dominant probability and probabilistic skyline. Section 4 introduces trigger events to track how dominant probability relationship varies when objects are moving with uncertainty. Two pruning strategies are adopted to reduce the search space. Section 5 describes PSUR algorithm to update -skyline set. Section 6 gives experimental evaluation on two real datasets. Finally, the conclusion is reached in Section 7.

Related Work
Skyline queries are hot areas of current database research which recently have attracted more and more attention. It is introduced into relational database firstly by Borzsony et al. [9] with two proposed processing algorithms Block-Nestedloops algorithm (BNL) and Extended Divide & Conquer algorithm (D&C). As an improved method of BNL proposed by Dr. Chomicki et al. Sort-Filter-Skyline algorithm (SFS) [10] constructs the multidecision dominative order chains for the ordered data. NN algorithm [11], based on R-tree, searches the nearest neighbor recursively. It speeds up the skyline query by reducing the comparison numbers of BNL among objects. Nevertheless, it will cost more time and space in searching subspace repeatedly. Overcoming this limitation, the sorted R-tree based BBS [12] is proposed. It is one of the best skyline query methods in centralized datasets.
In recent years, skyline query has been extended to the dynamic datasets. The R-tree based I-Eager and I-Lazy algorithms were brought forth by Tao and Papadias [13] who firstly studied how to update and maintain the skyline results in dynamic datasets.. Tian et al. [14] introduced GICSC updating skyline query sets dynamically which better fits for low dimension metrics. Kontaki et al. [15] exerted efforts to maintain -dominant skyline objects with maximum user's preference. Huang et al. [4] probed into the continuous skyline query for certain moving objects. It assumes that all points including the query point move in a predictable way. After analyzing dominating relationship between points in the space, Huang presented a continuous tracking algorithm-CSQ to maintain skyline sets dynamically.
In sensor networks and moving environment, the characters of the objects are not exactly known due to the limitations of measuring equipment and objects' movement. A lot of work has focused on uncertain data. The authors in [5] studied the execution of probabilistic range and nearest-neighbor International Journal of Distributed Sensor Networks 3 queries in mobile environment. The author in [16] focused on the situation in which the location of a query object is not exactly known. In research of skyline query field, Fiedler [17] firstly proposed skyline operator on uncertain data in his dissertation. Pei et al. [18] proposed a probabilistic skyline model for multiinstance data, where each object is part of the skyline with a certain probability. They presented two algorithms BUM and TDM to study skyline query on uncertain data of the possible world model (PWM). Lian and Chen [19] studied reverse-skyline query. They modeled the probabilistic reverse skyline query on uncertain data, in both monochromatic and bichromatic case, and proposed two effective pruning methods, MPRS and BPRS, to reduce the search space of query processing. Zhang et al. [20] explored how to maintain skyline sets when uncertain data are updated. An AR-tree is constructed for all uncertain data, and the maintenance is inserting, deleting, and updating on this AR-tree. The authors in [21] designed partitioning method to compute skyline probabilities for discrete data with uncertainty. The authors in [8] investigated skyline probabilities with a parametric form pdf (e.g., a Gaussian function or a Gaussian mixture model). The authors in [22] proposed a sliding window skyline model to study the execution of the probabilistic skyline query over uncertain data streams [23]. The authors studied a new problem of range-based skyline queries. Two novel algorithms I-SKY and N-SKY were presented to solve the probabilistic and continuous range-based skyline queries.
In road network, Huang and Jensen [24] assumed that the user's movement is constrained to a road network. The authors defined route nearest-neighbour skyline queries to consider the computation efficiency. Deng et al. [25] studied multisource skyline query in road networks. Three different road networks of multisource query methods were presented. The authors in [26] proposed a new method to process continuous skyline in road network based on precomputing the shortest range data of targets. The authors in [27] introduced route skyline computation in a multiattribute graph. Top routes are computed iteratively in an efficient way and pruning technique is adopted in order to reduce the search space. The authors in [28] focused on extracting the path skylines and proposed PathSL to generate an optimal skyline for moving objects.
In spite of much work focusing on uncertainty of skyline queries or route skyline queries, there is little work completed for skyline query concerning uncertainty of moving objects. The authors in [4,[22][23][24][25] ignored the uncertainty of moving objects, while the authors [8,18,21] dealt with discrete data with independent dominant relationship between objects. It is the first time to compute continuous probabilistic skyline query with regard to uncertainty of moving objects in road network.

The Probabilistic Skyline for Uncertain
Moving Objects

Skyline on Certain Points. Suppose that the point set is
where Num is the number of points, each point in an -dimensional numeric space Table 1   Position of police car  A  B  C  D  E  Number of policemen  6  6  6  4  5  Equipment level  3  3  2  3  2  Network distance to  125  130  140  155

Uncertain Model for Moving Objects.
After Wolfson et al. [1] firstly studied the uncertainty of moving objects, a lot of work has focused on this field [5][6][7]. An uncertain region [5] of a moving object at time , denoted by ( ), is a closed irregular region with a velocity ⃗ V such that the recorded location can be found only inside this region. The pdf represents the probability density distribution of an object in its uncertain region. The uncertainty pdf [5] of an object , denoted by ( ), is a pdf of , that has a value of 0 outside ( ).

The Dominant
Probability. What we mainly study on is as follows. There is a set of moving target objects whose centers are = { 1 , 2 , . . . , Num } ∈ and a stationary object as a query point to continuously compute updated probabilistic skyline dataset.
Dissimilar to the traditional skyline query, in moving environment, the spatial location is changing with time. All attributes are divided into dynamic attributes and static ones. Let us suppose that there are dynamic attributes, static attributes, and attributes where = + . For each object = ( 1 , 2 , . . . , ) , the static attributes construct a vector denoted by = ( 1 , 2 , . . . , ) , while the dynamic ones are made up of a vector = ( 1 , 2 , . . . , ) , where = + . For instance, in Figure 1, the number of police and equipment level belong to static attributes because they are invariant in our study case. However, the distance from to is varying continuously with police car moving. In order to simplify the description, dynamic attributes are supposed to include only spatial position (Table 1).
be two moving objects with uncertainty and be a stationary query point. Then probability that dominates is According to the definition of dominance in (1), we maintain that can have dominance probability on only if the static attributes of dominate the static ones of . As shown in Figure 2, for arbitrary point in ( ), each point in ( ) that satisfies dist( ⃗ , ) ≤ dist( ⃗ , ) will be dominated by , as the hatched area shows.

The Skyline Probability
Definition 2 (skyline probability). The skyline probability of object is the likelihood that object is not dominated by any other object and is defined below: where ( ⃗ ) and ( ⃗ ) are probability density function of object and , respectively, ⃗ ∈ ( ), ⃗ ∈ ( ).
Definition 3 ( -skyline). Let ∈ [0 ⋅ ⋅ ⋅ 1] be a threshold. The -skyline is the set of objects for which the following property holds:

Uncertain Model in Road
Network. The road network can be treated as a nondirection graph = ⟨ , , ⟩, where is node set representing crossroad, is edge representing roads between two crossroads, and is length of . In road network, the objects are limited movement, so the uncertain domain is denoted by a line segment with ( ) distribution, which ( ) is pdf, as Figure 3 shows.
Denote (V, V ) by the shortest path between nodes V and V . (V, V ) = ∞ if there exists no path from V to V . In road network, the distance is represented by shortest path.
Definition 4 (minimum distance function). The minimum distance between uncertain object to query at time is denoted by min ( , ) = ( , ) − .
Definition 5 (maximum distance function). The maximum distance between uncertain object to query at time is denoted by max ( , ) = ( , ) + .
In continuous movement, the dominant relationship of two objects will change. For two moving objects and , The maximum and minimum distance functions maybe intersect with each other among moving objects. Two distance functions of object and are given in Figure 4. The maximum distance of is less than the minimum distance of prior to time 1 , so cannot dominate . might begin to dominate from 1 to 2 , as the hatched area shows, so does it from 3 to 4 , and from 5 to 6 .

Tracking by Events
The probabilistic skyline set is updated in an incremental way. The key step is how to predict the time when the dominant relationship changes. It's hard to know the accurate time when -skyline just change. But we can estimate period of time during which the -skyline may change. Figure 4, from time 1 to time 2 , the dominant relationship between and might change, because the maximum distance of is greater than the minimum distance of . So we call the time between International Journal of Distributed Sensor Networks

The Pruning Strategy for Events. By analyzing relationship of objects we know that if static characters of any two objects
and have dominant relationship, supposing that ≺ , an event maybe occur. With the query object moving, a considerable amount of events will occur. In order to improve effectiveness of our algorithm, we propose a series of event pruning strategies.
Pruning Strategy 1. If the dominant relationship in static attributes between two moving objects and does not exist, the intersection of those two distance functions will cause no variation to -skyline set. In this case, the intersection will not cause any event.
Proof. Suppose ⃗ = ( 1 , 2 , . . . , ), ⃗ = ( 1 , 2 , . . . , ), whose static characters are ⃗ = ( 1 , 2 , . . . , ) and ⃗ = ( 1 , 2 , . . . , ), respectively. It is known that the static characters of and do not possess any dominant relationship, so ∃ , , > and < . Even considering their dynamic attributes, the dominant relationship does not exist at any time. In conclusion, if the dominant relationship in static attributes between two objects does not exist, the intersection of their distance functions will not cause any change for their dominant probability. Events cannot take place.
Proof. Because ( ≺ ) = 1, it is known that ( ≺ ) = 0 prior to time begin . The update for event( , , begin , end ) is invalid. It is also invalid when end < begin .

Continuous Probabilistic Skyline
Queries Algorithm In any case objects move, static always belongs to skyline. In initialization step, the moving path and distance function of each moving object need to be precomputed according to the information of road network and moving objects, apart from static .

PSUR Algorithm.
Not all skyline probability of objects will change at one moment. If maximum/minimum distance function of one object cuts that of others in Cartesian coordinates, skyline probability of this object maybe vary (Algorithm 2). In other words, trigger events include all possible variations of -skyline for each moving object.
In order to simplify the problem, we suppose that all moving objects preserve their velocity with uniform speed. If not, the precomputing cannot be processed in advance. All events should be recomputed again. The movement of moving objects to query point can be picked up with its shortest path to query point. The intersecting time for the distance function can be recomputed and put into event queue sorted by time in ascending order. For each time, skyline probabilities of moving objects under trigger events need to be updated.

Algorithm Analysis and
Discussion. The cost incurred by our method consists of three components: initialization, events computing, and updating by tracking.
In second step, the worst cost of comparison for static dominant relationship between objects is ( ( − 1)/2). The interaction time for arbitrary two moving objects' distance function will cost ( 1 ), where 1 is constant. Thus it will cost ( 1 ⋅ ( − 1)/2) in this period.
In conclusion, the total cost is added together for these three periods, which is equivalent to ( 2 ).

Experimental Evaluation
6.1. Datasets. Two real road networks are used to test the effect and efficiency of the proposed algorithm PSUR. One is the famous seashore city Oldenburg in Germany, which includes 6105 nodes and 7036 edges. The other is Cixi city of China, which contains 244 intersection nodes and 407 edges, much smaller than the first one. We assume that the uncertain area is segment along the road. Two distributions of uniform and Gaussian distribution for probability density function are adopted because these two functions are by far the most important and commonly used in statistics. Moving objects are generated randomly. Baseline and priority algorithms proposed in [8]   We conducted our experiments on desktop PC running on Windows XP professional. The PC has Intel Core 2Duo 2.93 GHz and 3 GB RAM memory. All experiments were coded in Visual C++ 2008.

Numbers of Moving Objects.
In this experiment, suppose = 5, Len = 10, and = 20. Figure 5 shows performance of PSUR versus baseline and priority [8] when numbers of moving objects vary on Cixi's road network, both in  uniform distribution and Gaussian distribution. Likewise, Figure 10 demonstrates the performance on road network in Oldenburg. The figures show that runtime increases with object's numbers whichever the model is. No matter what model is used, uniform or Gaussian, the cost in the same dataset is similar. Among these three methods, the response time of our algorithm is only one-tenth of the other two methods. It is because baseline and priority need recompute the probabilistic skyline set at each time while PSUR is an incremental method.
If there is the same number of moving objects in two real networks, the density of moving objects in Cixi is bigger than that in Oldenburg, because Cixi is smaller than Oldenburg. Therefore the chances for interaction of moving objects in Cixi are more those that in Oldenburg. Trigger events generated in Cixi are evidently more than those in Oldenburg, as Tables 2 and 3 show. The density of moving objects leads to this difference, because it is more sparsely distributed in Oldenburg than in Cixi for the same number of objects. As a result, by observing the performance between two city networks under the same data model, such as Figures 5(a) and 6(a), it is noted that the runtime in the smaller city Cixi costs a little more than that in Oldenburg.
The pruning strategy has done effective work on event size, as Tables 2 and 3 show. Event size will grow with the number of moving objects, because the number of the intersection of distance function rises.  and Oldenburg city, respectively, both in uniform distribution and Gaussian distribution. The magnitude of the uncertain length may also affect the performance of algorithm PSUR. It is seen that the longer the segment length is, the more runtime is required. If the length becomes much longer, for any two objects and , the probability that Pr( ≺ ) ̸ = 0, Pr( ≺ ) ̸ = 1 happen becomes greater. According to the definition of event, the size of event queue will increase, so will the times to track and handle events. The calculation of dominance probability and skyline probability grows with this increasing probability, so that the cost will go up.
It is shown that the segment length of uncertainty affects event's size, as shown in Tables 4 and 5. With increase of segment length, the number of events becomes great.
6.4. The Effect of Static Attribute Dimensionality. We will discuss the effect that multidimensional attributes impact on our algorithm over real road network. The result is shown in Figures 9 and 10. It is known that the higher dimension International Journal of Distributed Sensor Networks    results in less run-time. This is because the computation of dominance relation on static attributes is carried out only once at the beginning. The higher the static dimensions are, the less objects that dominant are. The less the time cost on computation of initializing adjacency list and event queue is, the less the total runtime is. Tables 6 and 7 show the event size changes with static dimensionality of PSUR. Unlike preceding two experiments, in this case, when static dimensionality increases, the event size reduces instead of increasing. This is due to the fact that, when increases, the number of moving objects which have dominant relationship in static dimensionality decreases. The distance function will interact less, so will valid events.
6.5. The Effect of Time Span. Baseline and priority are somewhat static algorithms. They need to recompute skyline probability for each moving object, so runtime is proportional to the time span. However, as a dynamic method, PSUR   can update the -skyline set by tracking event size to decide which one might vary, so the time cost is approximately the same as in each timestamp, which saves more time than other two algorithms. Figures 11 and 12 show this evidence. Tables 8 and 9 show the event size changes with time span length of PSUR. It is obvious that the event size grows with increasing time span length. Nevertheless, the difference is not significant. The runtime of PSUR varies a little in Figures  11 and 12.

Conclusion
To the best of our knowledge, this is the first work to compute probabilistic skyline queries for uncertainty in road network. In this paper, we have addressed the problem of continuous probabilistic skyline query for moving objects. Firstly attributes of objects are divided into static and dynamic to define dominant probability and skyline probability with continuous data. In order to update the skyline   set continuously, trigger events are introduced to compute varying skyline probability of objects. Two pruning ways are proposed to save search space and speed up this computation. At last, the continuous probabilistic skyline query algorithm with uncertainty in road network named PSUR is proposed. Finally, a series of experiments are devised to verify the effectiveness and efficiency of PSUR. The results of our   experimental study for different scales of datasets on two real road networks are very encouraging. The investigation demonstrates that the proposed updating method is more effective and efficient than periodic methods.  Event size after  pruning  5  1029  39  10  1597  51  20  2264  82  50  3387  113  80  4147  137  100 4450 144