Optimal policy for composite sensing with crowdsourcing

The mobile crowdsourcing technology has been widely researched and applied with the wide popularity of smartphones in recent years. In the applications, the smartphone and its user act as a whole, which called as the composite node in this article. Since smartphone is usually under the operation of its user, the user’s participation cannot be excluded out the applications. But there are a few works noticed that humans and their smartphones depend on each other. In this article, we first present the relation between the smartphone and its user as the conditional decision and sensing. Under this relation, the composite node performs the sensing decision of the smartphone which based on its user’s decision. Then, this article studies the performance of the composite sensing process under the scenario which composes of an application server, some objects, and users. In the progress of the composite sensing, users report their sensing results to the server. Then, the server returns rewards to some users to maximize the overall reward. Under this scenario, this article maps the composite sensing process as the partially observable Markov decision process, and designs a composite sensing solution for the process to maximize the overall reward. The solution includes optimal and myopic policies. Besides, we provide necessary theoretical analysis, which ensures the optimality of the optimal algorithm. In the end, we conduct some experiments to evaluate the performance of our two policies in terms of the average quality, the sensing ratio, the success report ratio, and the approximate ratio. In addition, the delay and the progress proportion of optimal policy are analyzed. In all, the experiments show that both policies we provide are obviously superior to the random policy.


Introduction
With the proliferation of personal smart devices, such as smartphone, human is able to capture information/ event from the physical world with smartphones more easily than before. [1][2][3][4] Embedded with a rich set of sensors, the current smartphone can support increasing applications across a wide variety of domains, such as crowdsensing, 1,5-7 environmental monitoring, 8 and social networks. 9 These applications can be classified into two major classes: participatory sensing (user is directly involved) and the opportunistic sensing (user is not involved). 5,10,11 In the participatory sensing, user can act as the preliminary sensor and decision-maker before his or her smartphone implements a certain sensing task. For example, users make decisions whether to take part in an application, and then operate his or her smartphone to implement the application. [12][13][14] Most of the previous works on crowdsensing take the smartphone into consideration, only a small part works suggest that crowdsensing should also include user as the sensor instead of just sensor carrier and operator. [15][16][17] For example, Wang et al. 16 took human as sensor and studied their behavior's affecting the sensing data quality. But there are a few articles noticed that humans and their smartphones depend on each other. There are two questions should be focused on the relationship with humans and their smartphones. The first is how to describe the relation between the smartphone and its user during smartphone sensing. The second is how two improve the performance of the smartphone sensing by exploiting the relation. As we all know, human has more powerful ability of recognition than the smart device and plays a key role before the process of smartphone sensing. In this article, we propose a framework to clarify the relation, and then study the performance improvement of the crowdsensing under a scenario, where users are willing to have good experience to take part in the crowdsensing. Since smartphones are under control of its user, its sensing decision is made after its user's willingness. We design the framework as conditional sensing as shown in Figure 1, where each user takes the action ''sleeping'' if he or she is not willing to taking part in the smartphone sensing. The scenario studied in this article represents a class of common applications in the participatory sensing, where some users are asked to implement a certain task, such as to detect the interesting object/event around them. We further investigate the case where the users have limited cost to implement the task, and hope a certain success implementation probability, denoted by z.

Summary of key contributions
The key contributions of this article are listed as follows: 1. This article studies the relationship between human and smartphone during the smartphone sensing, and proposes the framework: composite sensing. 2. We study the scenario of the object detection, and formulate the composite sensing problem, that is, how to improve the user experience under the framework of composite sensing as the partially observable Markov decision process (POMDP). We also design a new scheme, called composite sensing policy, to solve the composite sensing problem and get the maximal overall sensing quality. 3. We provide the theoretical and experimental analysis for the composite sensing policy. The theoretical optimization of the policy is guaranteed while the experimental results certificate the performance of the optimal and myopic policies we proposed.

Road map
This article is organized as follows. The related works are reviewed in section ''Related work.'' Section ''Preliminaries'' presents the composite sensing and system models. We formulate the composite sensing problem and map it as the POMDP in section ''Composite sensing problem.'' The composite sensing policy for the problem is designed and the theoretical performance of the policy is presented in section ''Composite sensing policy.'' The performance of our solution is also evaluated by the experiment in section ''Experiment results.'' The work of the whole article is summarized in section ''Conclusion.''

Related work
Today's smartphone is embedded in a number of specialized sensors, including camera, global positioning system (GPS), digital compass, and so on. It can sense the environmental information, and share the information with the friend of the smartphone holder or report to a certain server. 13 It has become not only the core communication device in people's daily life but also a smart sensing device for environmental monitoring, smart transportation systems, social networks, and so on. 10 Its applications are thus widely exploited and are extended to many more areas than before. According to the awareness and involvement of the user in the architecture as sensing device custodians, the smartphone applications can be classified into two major classes: participatory sensing (user is directly involved) and the opportunistic sensing (user is not involved). 10 The participatory sensing includes both the smartphone and its holder into the significant decision stages in the sensing application. One type of relation between the smartphone and its holder is the composite sensing proposed in this article.

Participatory sensing
A wide range of environmental information, such as road traffic, can be sensed and disseminated by ordinary citizens with smartphones. It brings a new way for the development of many application areas, such as environmental monitoring and social networks. The interesting examples include road traffic monitoring, 18 SmartPhoto, 17 and Ear-phone. 14 Rana et al. 14 designed an end-to-end participatory urban noise mapping system called Ear-phone. The key idea of Ear-phone is to crowdsource the collection of urban noise to people, who carry smartphones equipped with sensors and location-providing GPS receivers. In the end-to-end system, the urban noise is sent to a central server. A noise map is reconstructed and then is provided to the end user. In VTrack, some participatory drivers with smartphone send its location estimated by WI-FI or GPS to a central server in real time, and the server provides the real-time routes with the minimal travel time to users. 19 Mohan et al. 18 have presented TrafficSense to monitor road and traffic conditions in a setting where there are much more complex varied road conditions (e.g. potholed roads), chaotic traffic (e.g. a lot of braking and honking), and a heterogeneous mix of vehicles (two wheelers, three wheelers, cars, buses, etc.). Wang et al. 17 proposed a framework, called SmartPhoto, to quantify the quality (utility) of crowdsourced photos based on the accessible geographical and geometrical information (called metadata), including the smartphone orientation, position, and all related parameters of the built-in camera. The sensed photos are sent to a server by the participators and different rewards are feedback to them because the smartphone orientation and position cause the different sensing qualities. There are increasingly new applications appearing, such as CrowdAtlas, for generating a high quality map by crowdsourcing. 20 For more details on smartphone sensing, we refer interested readers to the survey articles. 2,10 From the observation from the related works on smartphone applications, we can find the following features: (1) sensing result report: many smartphone applications require the participators to report their sensed information to central servers; and (2) human acts sensor: in the smartphone applications with the participatory sensing, human is a key part of the systems in these applications, and makes key stages of the decision to sense the environmental information. Not all users are willing to be participators and not all of their sensing results have equal value because the smartphone types and sensing conditions may be different. 13,16 Human as sensor Human's decision is the necessary part of the smartphone applications with the participatory sensing, and has great affection on the sensing result. For example, SmartPhoto needs humans to observe the Event of Interesting (EoI) and then take pictures. 17 Most of the current smartphone sensing applications are based on voluntary participation. 13,21 In these applications, 13 humans estimate the incentive reward at first, and then operate their smartphone to participate if satisfied or they observe the EoI at first, and then decide to collect and report the information about the EoI if it is observed and satisfies requirement. 14, 18 Zhao et al. have showed that mobile crowdsourced sensing (MCS) is a new paradigm that takes advantage of pervasive smartphones to efficiently collect data, enabling numerous novel applications. They proposed incentive mechanisms which are necessary to attract more user participation to achieve good service quality for an MCS application. 21 ND Lane et al. have surveyed some existing mobile phone sensing algorithms, applications, and systems. They also discussed the emerging sensing paradigms, and formulated an architectural framework for discussing a number of the open issues and challenges emerging in the new area of mobile phone sensing research. 2 The smartphones' decisions base on their users' observation and decision. It is an underlying phenomenon in the applications of smartphone sensing. Wang et al. 16 used humans as sensors, and studied their decisions affecting the sensing data quality. Although human makes a key decision in the smartphone applications with participatory sensing, most of the previous works make simply an assumption on human's decision or ignore the humans' decision. Furthermore, the participator's decision and its relationship with its smartphone are fairly considered and researched.

Preliminaries
Object, observing, and sensing model This article concerns a set V of composite nodes to sense a set of m objects. The object in this article can be a target, such as the famous building, 17 and the EoI, such as the cellular or the Wi-Fi signal. 22 As shown in Figure 2(a), each object is assumed to have an orientation, and K aspects. Let the parameter u, u 2 f1, . . . , Kg, denotes the aspect that facing one node. For example, u = 2 means that the second aspect of the object o j faces the node. When the node takes the action to sense one u of the object's aspects, the action results in a certain sensing quality q(u), 0 ł q(u) ł 1. In this article, the sensing quality is defined as the function of the aspect as given by the following equation Each user's observing range is modeled as a disk as shown in Figure 2(b) and the smartphone's sensing range is modeled as a fan-shaped sensing area in Figure 2(c). They have the same radius since the user would not notice the object out of the observing range. The smartphone can fix a direction to sense one of the objects in its sensing range as shown in Figure 2(c). Let the object ID denotes the direction that the node chooses. The example in Figure 3 shows that the node has the directions as many as the number of the objects.

Conditional sensing
In the crowdsourcing applications with the participatory sensing, the smartphone must be under the control of its user. Each user acts the preliminary sensor, and implements the composite operation with his or her smartphone as a whole. We call such a whole as a composite node (node in brief) as shown in Figure 1. In each node, the user can make observing decision a 2 f0(sleeping), 1(observing)g to observe the state of the objects in the composite node's sensing range, and then the smartphone can make sensing decision b = f0(non À sensing), 1(sensing)g. The node implements the composite sensing: conditional decision-making. The sensing decision is based on the observing decision as shown in Figure 4. By the observing decisions a = 0, the node sleeps. Otherwise, the user observes the objects' states, and obtains the observation outcome Y j, k (t): u j (t) = k, where t is the time slot in the period T . Given the observation outcome Y(t), the smartphone makes the sensing decision. If the sensing decision is b j, k (t) = 1, the smartphone chooses the direction o j object to sense. Otherwise, the node turns to sleep. The observing and sensing decisions compose the decision space A, that is, A = fa, bg.
In this following context, we present the composite sensing from the view of an arbitrary node. The objects refer to these in the sensing range of the node.

System model
This article studies the scenario where the nodes and objects are static and uniformly randomly deployed in the interested area. With an additional server, these nodes and objects compose the composite sensing system. In each time slot, each object o j is in either of two states: disappear and appear. The object state is clarified by the following two concepts: object state and system state.

Definition 1
Object state. The object state indicates the appearance of an object o j in each time slot t, and is denoted by z j (t), where z j (t) 2 f0(disappear), 1(appear)g.
The design of the optimal observing and sensing decision uses the definition of the object state. When an object is in the state: disappear, that is, z j (t) = 0, it cannot be observed by any node. When the object is in the state: appear, that is, z j (t) = 1, it can be observed   and one of its K aspects faces one node. Assume that each object has the equal transition probability among the disappear state and the K aspects, that is, p(u 0 ju) = p(uju 0 ), and p(z = 0ju) = p(ujz = 0), and its state transition is independent of other objects. Suppose that there are m objects around the node. The definition of the system state is given as below.
Definition 2 System state. The system state is the collection of the states of the m objects, and is denoted by Given a sequence of time slots t 2 T, this article assumes that the system states s(t) form a Markov chain with the state space P = f0, 1g m . To achieve reward, each node observes and senses the objects around it, and then reports the sensing results to the server. Let g i j (u = k) denotes the report of the node v i for the object o j when o j 's kth aspect faces the node v i . The sensing quality of the report g i j (u = k) is thus q i j (u = k). If the report is accepted by the server, it returns the acknowledgment of the node with a certain reward. In this article, the server adopts the non-separable sensing quality rule in equation (2) as the rule to choose the reporting from the nodes. By the function, the server accepts the maximal sensing quality for the same object among the nodes' reporting for the same object where q i j is the sensing quality reported by the node v i for the object o j , and there may be more than one node sensing the same object o j simultaneously. By the sensing quality rule in equation (2), the report g i j (u j = k) can be successful if any other report g i 0 j (u j = k 0 ) for the same object o j has no aspect with higher quality, that is, k 0 ł k. Let g j (u j = k) 2 f0, 1g denotes the reported state of the object o j , which means that there is no report with the aspect higher than k if g j (u = k) = 1. Otherwise, g j (u = k) = 0. Most of symbols and their meaning are summarized in Table 1.

Composite sensing problem
This section presents the composite sensing process with the goal to maximize the overall sensing quality, and then maps it as a POMDP.

Compound sensing system
The structure of the composite sensing system, illustrated in Figure 5, implements the crowdsourcing task, which is implemented including four parts: task broadcast, composite sensing process, report, and reward.
Task broadcast. The application server broadcasts some advertisements to the users and to attract them to  participate in the task: to sense the objects in their sensing ranges. After the node accepts the task, it implements the composite sensing process to maximize the reward returned from the server.
Composite sensing process. Each node implements the composite sensing process, which is composed of conditional decisions made in a series of time slots. In each time slot, the observing decision a(t) is first made according to the historical observation and decisions, stored in the historical information vector H(t). Based on its outcome, the sensing decision b(t) is then made.
Observing decision. At the beginning of each time slot t, the node makes the observing decision. If the observing decision is made to be sleeping, the smartphone has to choose the sleeping sensing decision either in this slot. Otherwise, the user chooses one direction, that is, one object o j , to observe. If the object's state is appearance, that is, z j (t) = 1, the node can observe its orientation as shown in Figure 2(b). After the observation, the node obtains the observation outcome: the object state z j (t) and its orientation u j (t). Given the system state s(t) = s and the observing decision a = 1, the conditional PMF (probability mass function) of observation outcome, u j (t) = k, for the object o j is given by where k = 0 indicates that no aspect cannot be observed when the object o j disappears, and p(u j (t) = kjz j = 1) is the conditional probability that the observing outcome is u j (t) = k when the object state is z j = 1.
Sensing decision. After the observing decision for the object o j , the node makes the sensing decision b j (t) in the slot. If the observing decision a(t) = 0, the sensing decision must be sleeping, that is, b j (t) = 0. Otherwise, the smartphone makes the sensing decision according to the observation outcome: u j (t) = k. The node makes the sensing decision to sense the object o j , that is, b j (t) = 1, with the following probability where the conditional probability p(b j (t) = 1ju j (t) = k) 2 ½0, 1.
Report. After the sensing decision is made to achieve the sensing result q(u j (t)), the result is reported to the server. The server chooses the report with maximal sensing quality for the same object by the rule given in equation (2), and the server feedbacks the reward to the reporting node. In this case, the node's report is called a successful report. Denoted the successful report for the object o j by c j (k) when the observation outcome is u j (t) = k and k.0. The node with the successful report can thus obtain some reward from the server, and counts its successful report probability, denoted by p r (g j (k)). Recall that the successful report can be obtained only after the observing decision, sensing decision, and report are taken. So the successful report probability p r (g j (k)) can be formulated as the following equation where the last equality is obtained by equations (3) and (4) Reward. The reward, denoted by r(t), for the successful report is defined to be a monotonically increasing function with the aspect. This article uses the sensing quality as the reward, which means that the successful report with higher sensing quality obtains higher reward according to equation (1). Recall the definition of the composite sensing process in section ''Conditional sensing,'' the reward can be obtained only after the observing decision a(t) = 1 and the sensing decision b j (t) = 1. Then, the immediate reward r(t) in slot t can be given by Notice that the node chooses only one object to sense each time if its sensing decision b.0. It is willing to choose the object that can result in the sensing quality and probability of the successful reporting as high as possible. The objects have their own states: appear or disappear, which compose of the state space P. They switch between the states from one time slot t to next time slot t + 1 with some probabilities p P .

Convert to POMDP
The composite sensing process can be mapped as the POMDP. In the process, the node observes only a part of the objects around it, and the report result cannot be directly known after it reports the sensing result to the server. The system states thus cannot be fully observable. In the following, this article formulates the composite sensing process as the POMDP by a tuple hP, Y, A, P, qi: P is a set of objects' states in the node's sensing range. Y is a finite set of sensing and report results, that is, u, g 2 Y.
A is the decision space, that is, A = fa, bg, 8a 2 f0, 1g, b = f0, 1, . . . , Kg. P is a set of the system state transition probabilities: P = fp(s 0 js)g, 8 s, s 0 2 P. q(u): A 3 P ! (0, 1 is the sensing quality function. Belief vector. In the composite sensing process, the node makes the decision according to the historical information H(t) at the beginning of each time slot. The historical information vector H(t) is updated in each time slot t. As time goes on, the size of H(t) grows quite big. Smallwood et al. 23 showed that the conditional probability, denoted by B(t), of the system states of the objects around the node based on its decision and observation history H(t) can be a sufficient statistic of these objects' historical states. B(t) is named as the belief vector of the node for the states of the objects around it at the end of each time slot t À 1, and is defined as called belief state, is the conditional probability (given the observing and sensing history) that the objects' state is s at the beginning of slot t + 1 prior to the state transition. B(t) can be updated based on B(t À 1) and the decisions and report results in the slot t. We introduce an updating function T to implement the updating of the belief vector, that is, B(t) = T (B(t À 1)jY(t), A(t)). This article adopts a reward-based updating function T : B(t + 1) = T (B(t), Y(t), A(t), C(t)). Based on the Bayes' rule, the update of B(t + 1) is calculated in two cases. When the observing decision makes the node to sleep, that is, a = 0, the belief vector is updated based solely on the underlying Markovian model of the object state, that is, B(t + 1) = T (B(t)ja = 0). The belief element is updated by the following equation When the user takes the observing decision a(t) = 1, it can observe the system state s(t) = z(t) with the probability as equation (3). The information state can be updated by the Bayes' rule: 24 when the node is in the state s 0 at slot t, the belief state is the probability that the state is in the state s at slot t + 1 where the denominator is a normalizing constant and is given by the sum of the numerator overall values of s 2 P as the following equation where p s (kjs) is given according to equation (4).
Objective. The composite sensing policy is a sequence of decision couples: ha(t), b(t)i, t 2 T . The optimal policy, denoted by ha(t), b(t)i Ã , t 2 T, is to maximize the expected overall sensing quality in T under the constraint of the successful reporting probability threshold z. It is equivalent to finding the optimal policy for the finite constrained POMDP. Recalling the immediate reward given in equation (7), the goal of the optimal policy is given by where B(0) is the initial belief vector for the object states, and z is the threshold for the success report probability.

Composite sensing policy
Some previous works, such as the one-pass algorithm, 23 can carry out the sequence of the optimal decision. The computation complexity required to obtain the optimal decision increases exponentially with the size of the state space, and can be very high for the general POMDP. 25 One of the alternative methods for addressing this problem is to design the myopic policy. 25 Myopic policy focuses on the immediate reward and ignores the impact of current policy on future rewards. Generally, the myopic policy is suboptimal. In this section, we explore some specific properties of the composite sensing system: monotonicity and the independence between the action and object states. With these properties, the computation for the optimal policy given in this section can be simplified.

Value function
The key step of making the composite decision is to measure how good the previous decision is. Value function can express the objective in equation (11) explicitly as functions of the belief vector B and the observing and sensing decision ha, bi. Let F(B(t), A) denotes the value function, which is the maximum expected total reward that can be accumulated starting from t given the belief state B(t). To make the decision ha, bi in each time slot t can accumulate the reward started from t with two parts: the immediate reward given in equation (7) and the maximum expected future reward F(B(t + 1), A). Considering all possible system states s 2 P and the successful report probability in equation (5), and then maximize over all possible decisions in A, we can arrive the value function in the following equation where the first term in the right of the equation denotes the expected immediate reward r(B, A), and the future reward F t (B(t + 1)) can be calculated by the future belief vector B(t + 1) with the Bayes' rule. 26,27 The immediate reward r(B, A) is achieved in current time slot by taking the sensing action, and is given as r(B, A) = c(gjs)q j .

Optimal composite sensing policy
This section analyzes the properties of the composite sensing process, which includes: (1) monotonicity of value function and (2) monotonicity of success report probability. With these properties, we can obtain an explicit optimal design for the composite sensing process and a deterministic optimal sensing policy in Lemma 2, and observing policy in Lemma 3.

Lemma 1
Monotonicity of success report probability. Given the sensing decision b = 1, the success report probability p r (g(k)) increases with the observing outcome u(t) = k, that is, p r (c j (k 0 )) ø p r (c j (k)) for k 0 ø k.
The proof of Lemma 1 is referred to Appendix 1.

Theorem 1
Monotonicity of value function. The value function F(B, u) is monotonically increasing with the aspect u, that is, F(B, u 0 ) ø F(B, u) for u 0 ø u. The proof of Theorem 1 is referred to Appendix 1.
Recall that the object of the composite sensing process is to maximize the overall reward under the constraint of the successful sensing probability as given in equation (11). If there is no constraint, the node would always make the composite sensing to wake up in each time slot so as to maximize the overall outcome. With the constraint given in equation (12), the composite sensing must be decided carefully. Since the successful report possibility increases monotonically with the aspect u as claimed in Lemma 1, there must be an aspect, denoted by u(t) = k, such that the following condition is satisfied given the observing outcome u(t).0 According to equation (6), the successful sensing probability is affected by both the observing and sensing decisions. By Lemma 1, the sensing decision b = 1 with higher the observing outcome u(t) = k can result in higher success report probability p(c j (k 0 )). According to Theorem 1, the value function monotonically increases with the success report probability p(c j (k 0 )). Therefore, we can make a threshold-structured optimal sensing decision, which is given by the below lemma.

Lemma 2
Optimal sensing decision. Given the observing outcome u(t) = k, the optimal sensing decision b is given as follows where the threshold aspect k is defined in equation (14). The next is to design the optimal observing decision, which chooses the best object to observe in each time slot since there are m objects. It is easy to find that there is definitively no chance to obtain the reward if the object is in the state of disappearance, that is, z = 0. Lemma 2 shows the optimal sensing decision, that is, the sensing decision must be taken only if the observing outcome is u(k) = 1, k ø k in order to satisfy the constraint in equation (12). For the constraint composite sensing process, the observing decision has to choose the object, whose state is z = 1 and the aspect u(k) = 1, k ø k. The threshold of the aspect u( k) = 1 divides the object states into two groups denoted by e z = 1 and e z = 0. In the first group e z = 0, the object states includes z = 0 or z = 1 and the aspect u(k) = 1, k\ k. In the second group e z = 1, the object states includes z = 1 and the aspect u(k) = 1, k ø k. For each object o j , we also define two transition probabilities: s j and f j , between the two group states, as follows The above two probabilities can be calculated and updated from the transition probability of the system states given in equation (9) or (10) Because one object's states are independent of others', the probability that the object o j is in the group state e z(t + 1) = 1 can be updated according to the observing outcome in previous time slot by the following equation We have the following lemma to determine the optimal observing decision.

Lemma 3
Optimal observing decision. Suppose that there are m objects. Given the observing outcome in the previous slot t À 1, the optimal observing decision is to observe the object with the ½minfs, fg, maxfs, fg. The optimal observing decision in time slot t is to choose the object o j to observe, where o j = arg o j max c j (k), 8k = 1, . . . , K.
Proof. According to the definition of the group state e z(t + 1), the observing decision lets the node active when the object state is z(t) = 1 and u(t) = k, k ø k. Thus, the constraint is satisfied by the observing decision. Thus, the object state, which results in the maximal value function, must be contained in the group state.
Next, we prove by induction that the value of the observing decision given in Lemma 3 is maximized. According to the system model in section ''System model,'' the object states have the equal transition probability among its states. The transition probability does not change with time. When the observing decision makes the node to sleep, that is, a(t) = 0, there is no chance to outcome any observing result by equation (3), that is, p o (kjs) = 0. Thus, p = 0 given in equation (6). So F(p, s) = 0) = 0 in this case. When the observing decision makes the node to observe the object, that is, a(t) = 1, the belief vector can be updated by equation (9). Thus, we have B(t + 1) = T (B(t)ja = 1). Since the observing decision in Lemma 3, the probability p(z j = 1js(t) = s) in equation (6) for the object state in the group statez = 1 can be maximized. With F(p, s) = 1. For each object, the transition probabilities between any two states are equal, that is, p(z i jz j ) = p(z j jz i ). Therefore, the observing decision given in Lemma 3 is optimal.

Optimality of myopic policy
A myopic policy does not consider the impact of the current action on the future or long-term reward, and focuses solely on maximizing the expected immediate reward. It is usually suboptimal for the general POMDP. The myopic policy need not estimate the future reward so that the computation complexity can be reduced. In this article, the myopic policy only cares the impact on the next time slot so we modify the value function as the following equation The description of the myopic policy is quite similar to the optimal one except that equation (13) in step 5 of Algorithm 1 is replaced by equation (19).

Experiment results
In this section, we conduct numerical and simulation to verify the performance of our optimal and myopic policy by comparing it with a randomized algorithm, which is just to select some objects in each round randomly. We numerically analyze the impact of various parameters such as the average quality, sensing ratio, Algorithm 1. Optimal policy.
Input: Initial belief vector B(0) and z. Output: Overall quality q(T). 1: List all possible information states (B, p r (g j (k))), v j 2 V, k = 1, Á Á Á , K, that each node may go through. Let B include all such states such that the constraint in inequality (12). 2: Let = 0 for all states (B, p r (g j (k))) with p r (g j (k)).z, v j 2 V, k = 1, Á Á Á , K; 3: while t\= T do 4: if B is nonempty then 5: Compute the value function for the state (B, p r (g j (k))) 2 B with equations (9) and (13); 6: Get the maximal quality of all object and remove its state from set B; 7: end if 8: t = t + 1 9: end while success report ratio, and the algorithm approximate ratio under proposed algorithms in terms of the number of iterations and different thresholds. Besides, we give the progress proportion and delay analysis of the optimal policy.

Evaluation setup
To better validate the performance of our proposed algorithms, we build a test bed and conduct field experiments. Our evaluation field is divided into three disks according to composite node v 1 , v 2 , v 3 with its observing range. Seven objects are uniformly and randomly deployed in the field. The possible states of the seven objects in each time slot are: appear or disappear. The state in different time slots has no effect on each other. If an object appears, the orientation is also randomly distributed, and the orientation in different time slots is also independent of each other. In the following Figures 6 and 7, we consider the average quality, sensing ratio, success report ratio, and the algorithm approximate ratio as metrics for evaluation under various parameters: the number of iterations and thresholds.
Performance comparison Average quality. Figure 6(a) shows the average quality obtained by the optimal, myopic, and randomized policies, respectively, under the different number of iterations and fixed threshold value z = 0:1. After almost 200 iterations, the optimal policy gets a stable average quality, about 1:19. Besides this, the average quality achieved by the myopic policy is about 0:88 after nearly 500 iterations. In contrast, the average quality of the random policy is about 0:73 after 500 iterations, which is much lower than other policies as shown in Figure  6(a).
As shown in Figure 7(a), we evaluate the average qualities obtained by the optimal and myopic policies compared with the random policy when we set various thresholds and keep the fixed 1500 iteration times. With the threshold increasing, the optimal strategy always maintains a good expectation value of the average quality about 1:2. In contrast, myopic and random policies show insufficient performance. When the threshold is between ½0, 0:3, the myopic policy gets the average quality about 0:92 and the random policy gets it about 0:79. When the threshold is greater than 0:3, their average qualities drop badly.
Sensing ratio. As mentioned in equation (12), when the success report probability is less than the threshold z, we will not take the sensing action in the optimal and myopic policies. Figure 6(b) counts the ratio of the number of sensing actions to the number of observing actions with the number of iterations increasing from 0 to 1700. It reflects the sensing probability obtained by the optimal, myopic, and randomized policies after observing objects. Again, we set the threshold with the fixed value z = 0:1. After almost 300 iterations, the optimal policy gets a stable sensing ratio, about 84%. Besides this, the sensing ratio obtained with the myopic policy is about 80% after nearly 20k iterations. In contrast, the sensing ratio of the random policy is about 68% after 300 iterations, which is much lower than other policies as shown in Figure 6(a).
As shown in Figure 7(b), we evaluate the sensing ratio obtained by the optimal and myopic policies compared to the random policy when we set various threshold values and keep the fixed number of iterations, that is, 1500. The optimal policy always maintains a good sensing ratio with about 80%. In contrast, the myopic policy and random policy show insufficient performance. When the threshold is between ½0, 0:4, the sensing ratio gets by the myopic policy is about 65% and the sensing ratio gets by the random policy is about 54%. When the threshold is greater than 0:4, their sensing ratio performance drops badly.
Success report ratio. As mentioned in equation (2), the server only accepts the maximal sensing quality for a same object among the nodes' reporting for it. Figure 6. Convergence of the optimal, myopic, and random policies: (a) average quality, (b) sensing ratio, (c) success report ratio, and (d) approximate ratio.
Therefore, the success report ratio is also one of the criteria to evaluate how good a strategy is. Figure 6(c) counts the ratio of the number of success reports to that of observing actions with the number of iterations increasing from 0 to 1700. We set the threshold value to be 0:1. It reflects the success report probability obtained by the optimal, myopic, and randomized policies after observing objects. After almost 300 iterations, the optimal policy gets a stable success report ratio about 82%. Besides this, the success report ratio by the myopic policy is about 79% after nearly 400 iterations. In contrast, the success report ratio of the random policy is about 67% after 300 iterations, which is obviously lower than other policies as shown in Figure 6(c). As shown in Figure 7(c), we evaluate the success report ratio obtained by the optimal and myopic policies compared to the random policy when we set various threshold values and keep the fixed 1500 iterations. The optimal strategy always maintains a good success report ratio about 84% and the myopic policy shows insufficient performance with 79% success report ratio. However, the success report ratio of the random policy is only about 68%.
Approximate ratio. The approximation ratio can measure the performance difference of our policies. It reflects the performance of the optimal, myopic, and randomized policies clearly. Again, we set the threshold value of z to be 0:1. In Figure 6(d), the blue curve shows the approximate ratio between the myopic and optimal policies with the number of iterations increasing from 0 to 1700. It is obvious that the performance of the optimal and myopic policies goes stable after nearly 200 iterations. The approximation ratio of the myopic and optimal policies is about 78% finally. The orange curve shows the approximation ratio between the random and optimal policies with the number of iterations increasing from 0 to 1700. It is obvious that the performance of the random and optimal policies gets stable after nearly 200 iterations. The approximation ratios of the myopic and optimal policies are about 73%. The green curve shows the approximation ratio between the random and myopic policies with the number of iterations increasing from 0 to 1700. After nearly 150 iterations, the performance of the random and myopic policies gets stable. The final approximation ratio of the myopic and optimal policies is about 77%.
As shown in Figure 7(d), we evaluate the approximation ratio among the optimal, myopic, and randomized policies when we set various thresholds and keep the fixed number of iterations, that is, 1500. In Figure 7(d), the blue curve shows the approximate ratio between the myopic and optimal policies with the threshold varies between ½0, 0:5. The approximation ratios of the myopic and optimal policies are about 78% and relatively stable in the interval ½0, 0:35. When the threshold is greater than 0:35, the approximation ratio suddenly drops to around 40%. The orange curve shows the approximation ratio between the random and optimal policies with the threshold varies between ½0, 0:5. The approximation ratio of the random and optimal policies is about 60% and relatively stable in the interval ½0, 0:4. When the threshold is greater than 0:4, the performance suddenly drops to around 20%. The green curve shows the approximation ratio between the random and myopic policies with the threshold varies between ½0, 0:5. The approximation ratios of the random and myopic policies are about 70% and relatively stable in the interval ½0, 0:45. When the threshold is greater than 0:45, the performance suddenly drops to around 30%.
Delay and progress proportion. To review the complex perceptual system in Figure 5, the server goes through five steps from the start of broadcasting to feedback rewards to the object. In this experiment, we use delay to represent the time from the beginning of the broadcast to the end of the feedback. As shown in Figure 8, we observe that the delay of the optimal policy increases significantly as the number of objects increases. In addition, after several hundred iterations, the delay of the optimal policy is basically stable. In this experiment, it is assumed that we need the optimal strategy to complete the calculation of 1500 iterations. The progress proportion represents the percentage of the number of completed iterations to the total 1500 iterations under a particular timestamp. As shown in Figure 9, we observe that with the increase in the number of objects, the time to complete the fixed 1500 iterations of the optimal policy is significantly extended. The main trends in the results are summarized as follows: The average quality, sensing ratio, success report ratio, and other indicators obtained by the optimal policy and myopic policy tend to be stable. Compared with the myopic policy, some indicators of the optimal policy reached the stability earlier.
The effect of threshold setting on myopic policy and random policy is much greater than that of optimal policy. With the increase in objects' number in the experimental scene, the delay increases significantly and the progress proportion slows down significantly.

Conclusion
This article observed the phenomenon of composite sensing with user as sensor in crowdsourcing. The phenomenon usually happens and has not been well studied. We thus proposed the framework: composite sensing, and then map it as a POMDP problem. The composite sensing policy is proposed and analyzed theoretically and experimentally. The theoretical optimization of the policy is guaranteed. In this article, we discuss the case where the smartphone can choose one direction to sense in each time slot. We take another case as a future work, where the smartphone may choose one or more directions to sense in each time slot. Compared with traditional methods, the use of this method in large-scale environmental data has yet to be verified and optimized.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.