Selective Sensing and Access Strategy to Maximize Throughput in Cognitive Radio Sensor Network

This paper presents a selective spectrum sensing and access strategy in a cognitive radio sensor network (CRSN), in order to maximize the throughput of secondary user (SU) system. An SU senses multiple channels simultaneously via wideband spectrum sensing. To maximize the throughput and reduce the sensing energy consumption, not all of the channels are sensed. The SU selects some channels for spectrum sensing and accesses these channels based on the sensing results. The unselected channels are accessed directly with low transmission power. A selection making algorithm based on partially observable Markov decision process (POMDP) theory is proposed, to make the SU determine which channels are selected for sensing, how long the sensing time, and the transmission powers of channels. An optimal policy and a myopic policy are proposed to solve the proposed POMDP problem. Moreover, an optimization problem is proposed to solve the synchronism problem among the selected channels. Numerical results show that the proposed selective spectrum sensing and access strategy improves the system performance efficiently.


Introduction
Wireless sensor networks (WSNs) play a critical role in many research areas, such as machine-to-machine network (M2M), emergency communication, and smart home [1][2][3][4]. There are two characteristics about the WSNs: limited energy and scarce spectrum. Generally, the nodes of WSNs are powered by battery; energy efficiency is one of the design factors. In this paper, we focus on the spectrum scarcity problem of WSNs. According to the dramatic increasing number of wireless devices, the problem of available spectrum scarcity becomes more serious. Cognitive radio (CR), which has been introduced as a way to improve the efficiency of spectrum utilization, becomes a research focus in recent years . Combined with the advantages of CR and WSNs, cognitive radio sensor network (CRSN) has been studied [1-4, 6-8, 15]. In studies [1,2], CR has been used in smart grid communication networks. The spectrum access strategy has been proposed, in order to find much available spectrum resource for data collection and transmission. In order to improve the energy efficient of vehicular ad hoc network, CRSN has been studied in [3]. In the CRSN, the wireless nodes are defined as secondary (unlicensed) users (SUs), while the other available spectrum owners are defined as primary (licensed) users (PUs). The important challenge is that the SUs find available channels and adjust their transmission parameters (transmission power, transmission time, carrier frequency, etc.) to access these channels, while avoiding harmful interference to PUs.
Normally, the available spectrum opportunities are found in time [4,7,8,[10][11][12]15] space [14] and frequency domains [18]. In [10][11][12], SUs perform spectrum sensing to find access opportunities in time domain. The SUs access the licensed channel when it was not used by PUs. In [4], channel sensing and switching were used for SU to access a set of PU channels. In [7,8], two spectrum sensing methods were studied in CRSN. In [15], both single-user and collaborative spectrum sensing schemes were proposed in cognitive sensor networks, while quantised sensing observation is used. In [13], the SUs accessed the licensed channel directly at any time; the transmission power was limited to avoid unacceptable interference to PU system. The spectrum access opportunity in space domain was used. In order to find available spectrum in different channels, parallel sensing [19], cooperative sensing 2 International Journal of Distributed Sensor Networks [17], and wideband spectrum sensing (WSS) were studied. In [16,17,20], cooperative sensing has been studied. The SUs sense other channels if the current channel has been occupied or detect a set of channels simultaneously [21][22][23][24][25], to find available spectrum in a wideband spectrum.
Different from the excessive spectrum sensing and access researches, which focus on utilizing the time or space domain only, a spectrum sensing and access strategy of CRSN is proposed in this paper, in which an SU system uses the available spectrum both in time and space domains in multiple channels. WSS is used for SU to identify the presence of PU signals. Different from the normal spectrum sensing scheme, which an SU senses the channels one by one, in WSS scheme, multiple channels are detected simultaneously by an SU; the sensing time durations of channels are the same. After spectrum sensing, the SU accesses all of the channels with mixed access strategy (MAS) [26,27]. Under MAS, the SU accesses the channels with different powers based on the sensing results. When the channel is sensed as idle, the SU accesses it with a higher transmission power; the available spectrum in time domain is used. Otherwise, transmission power is lower enough to avoid unacceptable interference to PUs. The available spectrum in space domain is used. Thus, comparing with other spectrum access strategies, the SU in MAS obtains greater throughput.
However, all of the channels are selected for WSS which is not a suitable choice. There are three reasons for the necessity of the sensing channels' selection. First, sensing all of the channels is a huge challenge; great energy is needed [9], which is not a good choice for WSNs. Second, if PU signals in some channels are weak, much more sensing time is needed to guarantee the protection of PUs. For the reason of WSS utilization, the sensing overhead of the system is prolonged. Finally, when the idle probabilities of some channels are low, the SU obtains a larger average throughput when it accesses these channels directly, compared with the average throughout which is obtained when the SU accesses the channels after sensing. Thus, in this paper, the SU selects some channels for spectrum sensing. For the channel which has been selected, the SU accesses it and adjusts its transmission power based on the sensing results. Otherwise, the SU accesses the channels directly via underlay access strategy. Therefore, in our proposed system model, a tradeoff exists between achieving larger throughput and selecting appropriate sensing channels.
According to the dynamic spectrum environment, the SU cannot obtain accurate states of PU channels, due to the imperfect spectrum sensing and not all of the channels are selected for sensing. In this paper, we propose a selection making algorithm by using the partially observable Markov decision process (POMDP) theory. Under the selection making algorithm, the SU determines which channels are selected for spectrum sensing, how long the sensing time, and the transmission powers of the accessed channels. The objective of the selection making algorithm is to maximize the throughput of SU system, while avoiding unacceptable interference to PUs. An optimal policy and a myopic policy are derived to solve the formulated POMDP problem. Moreover, we present an optimization algorithm to solve the synchronism problem among the selected channels. Extensive numerical examples are proposed to demonstrate the merit of the proposed algorithms.
The contributions of this paper can be described as follows.
(i) A new selective sensing and access strategy of CRSN based on POMDP theory is proposed, in which at beginning of each slot, an SU selects some channels for wideband spectrum sensing and accesses all of the channels via mixed access strategy.
(ii) An optimal policy and a myopic policy are proposed to solve the proposed POMDP problem.
(iii) We consider an optimization problem, in which the decision probability thresholds of the selected channels are jointly optimized, in order to ensure the synchronism among the selected channels.
The rest of this paper is organized as follows. In Section 2, we review the related work. System model is proposed and analyzed in Section 3. In Section 4, we discuss a selection making algorithm via the POMDP framework; an optimal policy and a myopic policy are proposed to solve the proposed POMDP problem. The advantage of the proposed algorithms is illustrated by numerical results in Section 5, and conclusions are drawn in Section 6.

Related Work
Wideband spectrum sensing has been discussed in [21][22][23][24][25]. In [21], the detection thresholds of energy detectors in channels were jointly optimized. In order to maximize the throughput of SU, the sensing time and detection thresholds were jointly optimized in [22]. In [23,24], both the sensing time and transmission powers of channels are jointly optimized. However, in these works, the SU selects all of the channels for spectrum sensing, and accesses these channels only when the sensing result is idle. The selection of sensing channels has not been considered. Different from the previous works, in this paper, not all of the channels are selected for sensing. After spectrum sensing, the SU accesses the channels via mixed access strategy. No matter the sensing result is idle or occupied, the SU accesses the channel with different transmission powers, and greater throughput is obtained. Moreover, different from the previous works in which the WSS was considered in one slot, we consider the sensing channels' selection and spectrum access in multiple slots. The problem becomes complicated.
According to the time-varying character of the dynamic spectrum environment, POMDP is used to formulate the selective spectrum sensing and access problem. In [28,29], two optimal opportunistic spectrum access MAC protocols were proposed. In [30], a well-known separation principle was proposed to transfer the solution of the POMDP problem from optimal policy to myopic policy. However, both are based on the condition that the SU can sense and access one channel in a slot. In our proposed scheme, the SU selects several channels for spectrum sensing and accesses all of the channels after sensing. The calculation of sensing time becomes complicated. In [31], an adaptive sensing scheduling International Journal of Distributed Sensor Networks scheme was proposed, based on POMDP theory. The study [32] is probably the most relevant paper, in which an optimal sensing channels' selection policy is proposed. However, it accesses the channel only when the sensing result is idle. In our work, the SU accesses all of the channels with different transmission powers. Although the previous works take the same mathematical method (POMDP), the problem in this paper becomes quite complicated, and some efficient methods are proposed to solve the problem.

System Model
In this section, we present the system model of this paper and the structure of wideband spectrum sensing. Then, a selective sensing and access strategy in multiple channels is proposed.

System
Model. An SU system shares a licensed wideband spectrum assigned to PUs, which can be divided into nonoverlapping narrowband channels. The channels operate in a time-slotted manner. The traffic of PU system is modeled as a two states ON-OFF process. Figure 1 shows the structure of SU system. We assume the SU obtains the duration of slot and can keep synchronization with PUs [10,11]. Denote as the probability that channel transits from ON to OFF state and as the probability that channel transits from OFF to ON state. We assume and are obtained from the previous long time measurement.

Joint detection Wideband RF antenna
Channel f 1 probabilities are 01 = , 00 = 1 − 01 , 10 = , and 11 = 1 − 10 . At beginning of each slot, the SU selects some channels for WSS and accesses the channels with MAS. For the other channels, the SU waits the same time, in order to keep synchronism with others. The unselected channels are accessed directly. The SU cannot transmit in one channel while performing spectrum sensing in another one. Denote ( ) as the sensing time duration of channel in slot . Under MAS, the SU detects the channel states firstly. When the sensing result is idle, the SU accesses it with power 1 . When the sensing result is occupied, the transmission power is 2 . Normally, we assume 1 > 2 . The values of 1 and 2 are obtained by some power allocation optimization algorithms [27], which are not our focus in this paper. If the SU accesses the channel directly, the transmission power is 2 . After transmission, the SU receiver announces ACKs to the transmitter of SU. If the transmission is not successful, the SU receiver announces a NAK. The time duration for acknowledges is ignored in our proposed structure. The structure of SU in one slot is shown as Figure 2.

Wideband Spectrum Sensing.
In WSS scheme, we assume an wideband spectrum is occupied by some PUs. The PUs are operating in different spectrum bands; idle probabilities of channels are not the same. WSS is used to identify the presence of PUs in some special channels. In WSS, the wideband is divided into nonoverlapping channels. The SU receives data through a wideband RF antenna. Then, the received data is passed through a high speed A/D converter [21][22][23][24][25]. The structure of WSS is shown as Figure 3. We can find that the sensing time durations of channels are the same.

Selective Sensing and Access Strategy.
At the beginning of each slot, the SU determines (1) which channels are selected for spectrum sensing, (2) how long the sensing time, (3) which channels to be accessed with power 1 , and which channels with power 2 . The design objective is to maximize the throughput of SU system during a desired period of slots, while the interferences to PUs are under the predetermined thresholds. Let Ω denote the set of the selected channels. For the channel in slot , ∈ Ω, the SU selects it for spectrum sensing and accesses it based on the sensing results. According to the difference between the sensing results and the real states of channel, four cases are considered. When the sensing result is idle and the channel is not occupied by PUs, the transmission is successful with power 1 ( ); the transmission rate is 00 ( ). However, if the real state of the channel is occupied and the SU has not detected it, the transmission of SU cannot be successful, for the reason of the interference from PUs. When the sensing result is occupied, the transmission power is 2 ( ). The power 2 ( ) is limited to avoid unacceptable interference to PUs. No matter the real state of channel is idle or occupied, the transmission of SU is successful. Therefore, the achievable rate of channel in slot is given by where 00 ( ) = log 2 (1 + ( 1 ( )ℎ / 0 )), 01 ( ) = log 2 (1 + ( 2 ( )ℎ / 0 )), and 11 ( ) = log 2 (1 + ( 2 ( )ℎ /( ℎ + 0 ))). ℎ and ℎ are the channel gains, is the bandwidth of channel, and is the power of PUs. The channel which has not been selected, the SU estimates it as occupied and accesses it directly with transmission power 2 ( ). The transmission rate is ( ) = ( 0 )log 2 (1+( 2 ( )ℎ / 0 ))+ ( 1 )log 2 (1+( 2 ( )ℎ / ( ℎ + 0 ))). To maximize the throughput of SU system, the optimization problem is formulated as subject to ∈ {1, 2, . . . , } ∈ {1, 2, . . . , } .
Constraint (4) is a sensing time constraint, in order to protect PUs. Constraint (5) guarantees the WSS is used for spectrum sensing. It is not easy to solve this problem directly. We present a selection making algorithm based on POMDP theory, to find the optimal and suboptimal solutions.

Selection Making Algorithm
In the proposed strategy, because not all of the channels are selected for sensing and the presence of sensing errors, the SU cannot obtain the accurate states of each channel. At the beginning of each slot, decisions are made based on the previous actions and observations. This setting matches well with the POMDP framework [28][29][30][31][32][33][34][35]. Therefore, an optimization selection making problem under the proposed strategy is formulated as a POMDP, which determines an optimal policy for sensing channels' selection, the size of sensing time, and the access decisions. Next, we describe the POMDP framework. An optimal and a myopic policy are proposed to solve the POMDP problem.
where A 1 ( ) denotes which channels are selected for spectrum sensing, (i) If channel has been selected for sensing and the sensing result is idle, the SU accesses it with power 1 ( ). After transmission, the SU transmitter receives ACK1, which indicates the sensing result is correct. We denote this as observation 0, ( ) = 0.
(ii) However, if the SU transmitter receives NAK, it indicates the sensing result is wrong; the transmission of SU causes unacceptable interference to PU. We denote this as observation 1, ( ) = 1.
(iii) If the sensing result of channel is occupied, the SU accesses it with power 2 ( ). The SU transmitter receives ACK2 after transmission. We denote this as observation 2, ( ) = 2.
(iv) If the SU accesses channel directly with power The difference between observations 2 and 3 is that the SU has not sensed the channel in observation 3. Although the announcing signals are the same, the transmitter of SU can distinguish different observations.

Belief Vector.
In the POMDP formulation, belief vector is used to infer the channel states at the beginning of each slot. It is a conditional probability for the past history, including the past decisions and observations. At the end of each slot, the belief vector is updated based on different actions and the corresponding observations, in order to obtain accurate information of the dynamic environment. The belief vector of channel in slot is denoted as b ( ), b ( ) = [ ,0 ( ), ( ) = 0 means the real state of the channel in slot is idle. The belief vector in slot +1 is ,0 ( +1) = 00 . ( ) = 1 means the real state is occupied. ,0 ( + 1) = 10 . When ( ) = 2, the SU senses channel and the sensing result is occupied, but the SU cannot obtain the real state of that channel. From the Bayes rule, the belief vector updated formula is given as the following equation: Finally, when ( ) = 3, the belief vector is updated as ,0 ( + 1) = ,0 ( ) 00 + ,1 ( ) 10 .

Reward Function.
Denote ( ) as the immediate reward of channel in slot ; the immediate reward of the SU system is ( ) = ∑ =1 ( ). It is associated with actions and observations. Denotẽ( ) as the expected reward of channel . When the channel has not been selected for spectrum sensing, we havẽ Otherwise, the expected reward of the SU system̃( ) is calculated as where ∈ {0,1,2} denotes the observation value space. Pr( ( ) = | b ( − 1), A ( ), ( )) is the probability of observation; it is associated with the belief vector in last slot, the actions and the real state of the current slot. The expression of̃( ) is shown as follows: Based on the above discussion and analysis, the procedure in the POMDP framework is shown as Figure 4.

Solution to POMDP.
In the proposed scheme, the design objective is to develop an optimal selective sensing and access policy in each slot, in order to maximize the expected total reward obtained in the finite slots. The complete problem formulation based on POMDP is given by subject to , ( ( )) ≥ , ℎ ∈ {1, 2, . . . , } ∈ {1, 2, . . . , } . It is a constraint POMDP problem, which requires an intractable randomized policy to achieve optimality. However, the objective function can be separated from the constraint, if the SU trusts the current sensing result and accesses the channel based on the sensing result in the current slot [28,30]. The sensing time is obtained from , ( ( )) = , ℎ , ∈ {1, 2, . . . , }. The problem becomes an unconstrained POMDP problem.
After obtaining the sensing time, the problem reduces to a simple one. Two questions are considered: which channels are selected for spectrum sensing and which transmission powers are selected for transmission. The narrowband channels are independent with each other. The actions of SU system can be divided into the combination of each channel's actions. Then, we can calculate the optimal actions of each channel independently.

Optimal Policy.
In order to calculate the optimal policy of channel effectively, a value function (b ( − 1)) is proposed, which denotes the maximum expected remaining reward accumulated from slot to the frame horizon slot , when the current belief vector is b ( − 1). Using Bellman equation, we have It represents the updated knowledge of channel state based on the actions and observations of SU in slot .
( ) is given by (13). The value function contains two parts: the immediate expected reward obtained in slot and the maximum expected future reward. The optimal policy is obtained via a fast point-based solution method [35].

Myopic
Policy. The solution of optimal policy leads to great computational complexity, especially when the number of channels is large. In order to address this problem, a myopic policy is proposed, in which the SU maximizes the immediate expected reward in the current slot . The myopic policy solution is given by Generally, the myopic policy balances the computational complexity and the optimality of solution. Dynamic programming can be used to find the solution.

Synchronism among Channels.
In the proposed solutions, we calculate an optimal policy and a myopic policy of each channel independently, instead of calculating all of the channels at the same time. The access point is , ( ( )) = , ℎ , and the sensing time ( ) is obtained. They may be different from each other. However, in this paper, the SU senses the selected channels with WSS; the sensing time durations of the selected channels should be the same. Therefore, a question exists: how can one ensure the synchronism among the selected channels?
Denote , ℎ as the detection probability threshold of the selected channel . In order to maximize the current reward of the slot and ensure the synchronism of the selected channels, we adjust the , ℎ according to the difference between the selected channels. The sensing time of the selected channel is obtained from , ( ( )) = , ℎ .
Then, the , ℎ is formulated as a function of sensing time, , ℎ ( ( )). The optimization problem is formulated as max subject to The detection probability thresholds of each channel are adjusted, based on the sensing time duration. The , ℎ ( ( )) should be set as larger than the predefined threshold , ℎ , in order to ensure the protection of PUs. Constraint (21) is converted to the interval of sensing time ( ), based on the convex character of the function , ℎ ( ( )). One-dimensional exhaustion search method is used to solve this problem. The sensing time in slot is optimized to balance maximizing the immediate current expected reward and keeping synchronism among the selected channels.

Numerical Results
In this section, the proposed optimal selective sensing and access policy will be compared with myopic policy and random policy under different simulation conditions. In random policy, the SU selects channels for spectrum sensing, and the access actions are also selected randomly. The sensing time durations of the three policies are the same, which are obtained from the proposed optimization algorithm (20). The slot size of PUs is fixed, the same as the sensing period of SU system, = 10 ms. The channel power gains of channels are ergodic stationary. For the sake of simplicity, we assume the adaptive modulations are not used [33,34]. When the transmission power is 1 , the transmission rate is 00 ( ) = 0.06 Mbps. When the power is 2 , the transmission rates are 01 ( ) = 11 ( ) = 0.02 Mbps. The bandwidth of each narrowband channel is = 1 MHz and the sampling frequency is = 2 MHz. The decision threshold is = 1.5, the predefined detection probability is , ℎ = 0.9, and the number of slot is = 30. Figure 5 shows the performance of SU's aggregate throughput under different total channel number . We consider two cases: = 6 and = 2; denotes the number of subchannels. The idle probability ( 0 ) and the signalto-noise ratio (SNR) are shown in Table 1. It is found that the performance of SU in the optimal policy is better than others. The reason is that the SNR affects the sensing time of that channel. When the channel number is large, the channel with smaller SNR affects the whole sensing time duration efficiently. In optimal policy, the SU obtains greater throughput gains when it selects some channels for spectrum sensing, not all of the channels. Comparing with the myopic policy, the selection is more accurate. The SU in random policy may select the channels with lower SNR, which affects the sensing time and the throughput of the SU system. When the channel number is large, the SU in the optimal policy obtains greater throughput than others.
In Figure 6, we study the performance of SU's aggregate throughput under different idle probability of two adjust PU channels. The idle probability of channel is ( 0 ) = 0.3 + ( − 1)(0.1 + 0.05 * ( − 1)). When = 1, 2, 3, the idle probabilities of the channels are 0.3, 0.4, 0.5, 0.6, 0.3, 0.45, 0.6, 0.75, and 0.3, 0.5, 0.7, 0.9, respectively. It is found that the SU in the optimal policy obtains greater throughput than others under different cases. This is because the idle probabilities of channels affect the updating of belief vector. Under the optimal policy, the SU selects the suitable sensing channels and access actions. Figure 7 illustrates the performance of aggregate throughput under optimal sensing time and fixed sensing time. Both are under the myopic policy. In the fixed sensing time case, the SU senses the channels in each slot with ( ) = 2 ms,   ∈ {1, 2, 3, 4}, ∈ {1, 2, . . . , 30}. It is found that the SU with the optimal sensing time obtains greater throughput. The reason is that the sensing time in each slot is optimized; it balances the tradeoff between maximizing the throughput and sensing efficiency. Figure 8 illustrates the performance of aggregate throughput under different access strategies. All of them are under the optimal policy. The number of channels is four, and the probability of channel is ( 0 ) = 0.3 + 0.2 * ( − 1); the SNR is = 2.1 + 0.2( − 1), ∈ {1, 2, 3, 4}. It is found that the SU with the mixed access strategy can obtain larger   throughput than others. In the underlay access strategy, the SU accesses the channels directly without sensing; the sensing time is saved. However, the transmission power is fixed as 2 .
In the overlay access strategy, the SU accesses the channels just when the sensing result is idle. The SU in the mixed access strategy can access the channels, no matter the sensing result is idle or occupied. Thus, the SU can obtain greater throughput.

Conclusion
In this paper, we propose a selective spectrum sensing and access strategy in cognitive radio sensor networks. In order to maximize the aggregate throughput of SU system and reduce the spectrum sensing energy consumption, the SU selects some channels for spectrum sensing, accesses these channels based on the sensing results, and accesses the other channels directly. According to the dynamic spectrum environment, a selection making algorithm based on PODMP theory is proposed. An optimal policy and a myopic policy are proposed to solve the POMDP problem. Theoretical analysis and numerical results show that the proposed selection making algorithm can better balance maximizing the throughput of SU system and avoiding unacceptable interference to PUs.