Analysis of space labeling through binary fingerprinting

In the context of fingerprinting applications, this article presents the performance analysis of a type of space labeling based on the binary quantization of the received signal strength indicator. One of the common drawbacks of fingerprinting is the large data size and consequently the large search space and computational load as a result of either vastness of the positioning area or the finer resolution in the fingerprinting grid map. Our approach can be considered, for example, when we use very small, inexpensive beacons, like those based on bluetooth low energy technology, radio frequency identification, or in the future context of the Internet of Things. One of the interesting properties of this deployment is that it can be interpreted as a form of space labeling or encoding since space is divided into cells, and each cell is associated to a binary codeword with the corresponding scalability of the spatial resolution. Here, it developed the performance estimation, exploiting the association of this deployment to an error correcting code. The analysis and numerical and experimental results allow a deeper understanding of the impact of the proposed solution and show that it is robust and computationally efficient with respect to the traditional fingerprinting technique.


Introduction
Due to recent developments in hardware electronics and communications, wireless personal area networks (WPANs) and wireless sensor networks (WSNs) have found outstanding importance in diverse applications such as industrial, medical, public services, and many other fields. At the same time, Internet of things (IoT) is expected to become the pervasive infrastructure of the information society. In this context, position information is an important enabler of new and future value-added services.
In the last decade, we observed a constant growing effort, both from scientific communities and from the industry in order to design a global solution for the indoor positioning and navigation that achieves high precision, reliability, and cost effectiveness. The availability of such IPS (Indoor Positioning Systems) will enable advances in location-aware applications, pervasive computing, and ambient intelligence. According to Wirola et al., 1 the high-accuracy indoor positioning will be based on dedicated positioning-specific tags, and this is because of the trade-off between costs and performance. In fact, the wireless local area network (WLAN) technology offers the best solutions in terms of costs and this thanks to the reuse of available infrastructures although it has many limitations in terms of accuracy and reliability. A solution based on bluetooth low energy (BLE) technology offers several advantages. Indeed, it ensures that the mobile terminals require no new hardware components and that the radio components are already in mass-production keeping the costs of deployment and new applications low. As a result, it represents a valid candidate for a large-scale solution.
The targets' position can be estimated via geometrical, Bayesian, or pattern-matching approaches. [2][3][4] The geometrical approaches use range-based techniques such as triangulation and multilateration. The range is estimated from the received signal exploiting different signal metrics, for example, time of arrival (ToA), angle of arrival (AoA), and received signal strength indicator (RSSI). The Bayesian approaches, used mainly for tracking application, compute the probability density function of the target position, providing its mean value as the estimated position and the variance as the uncertainty of the estimate. The pattern-matching approaches or fingerprinting (FP) take advantage of location dependent (LD) features of the signals received by static reference stations, or beacons: these signals, typically RSSI measures, are exploited as unique signatures associated with the target locations. First, a radio map containing stored LD parameters measured over predetermined points (grid points) is built during an off-line or training phase and then the target position is estimated via pattern matching between measured LD parameters and those previously recorded. In He and Chan, 5 it is possible to find some recent trends in two of the major research areas for FP localization: advanced localization techniques and efficient system deployment. Among the advanced approaches, some techniques are based on location-specific characteristics of multipath instead of RSSI as in Wu et al. 6 and Jin et al., 7 while, in the specific area of IoT applications, the RSSI database is assisted in the online phase by other methods for increasing positioning accuracy, as magnetic or dynamic measures from other sensors or prediction models, as in Lin et al. 8 On the contrary, where complexity and costs are a hard constraint, the FP techniques are used since they can be really easy to implement and are cost-effective, while maintaining a reasonable degree of accuracy. Among the main complexity issues, we can observe that the search space during the online step can be computationally intense, either because the deployment area is wide (e.g. smart city, hospitals, or large factories) or because it is based on devices, such as BLE tags, with strong limitations on power consumption and limited hardware capabilities. In order to overcome this restraint, the works in Arya et al. 9 and Saha and Sadhukhan 10 have proposed clustering and spatial filtering techniques that limit the positioning algorithm to a subset of reference points (RPs) in order to narrow down the search space and focus on the relevant subset of RPs.
In this context, in the past, we have focused on the possibility of reducing further complexity and requirements of FP systems based on simple beacons in limited indoor environments. In Mizmizi and Reggiani, 11 we have investigated the role of quantization in the RSSI information. The obtained results showed that the computational complexity can be reduced, without loss of precision, by adapting the RSSI quantization with respect to the variance of the measurement noise. In Mizmizi and Reggiani, 12 we have focused on the simplest quantization, with two levels, exploring its relations with binary encoding. We have introduced a new scheme of a specific binary representation of the RSSI signatures and the measures (binary fingerprinting or BFP). This novel design is appropriate when the beacons are characterized by very limited size, cost, and computational capability, like in the BLE or in the future technologies for the pervasive IoT. In addition to the pure localization, BFP can be used for creating an ordered partitioning and labeling of the space, available in a simple and inexpensive way to each device in the covered area, with numerous potential applications in the area of context awareness.
In this article, we consider the design presented in Mizmizi and Reggiani, 12 developing the analytical estimation of the space discrimination provided by this novel spatial labeling. Therefore, the main contributions of this article are Analytical analysis of the FP technique in two cases: in the former, the number of RSSI quantization levels is assumed infinite, while, in the latter, the quantization levels are reduced to 2 obtaining the BFP. Moreover, the former is used always as a benchmark for the system; Interpretation of BFP as a spatial encoding with scalable resolution and logarithmic search of the solution; Validation of the numerical results, through experimental tests and channel models derived by experimental measures. The numerical findings, either by means of appropriate analysis, simulations, and experimental measures, show the impact of different parameters and channel conditions on the system performance.
The remainder of this work is organized as follows: the ''System model'' section describes the network scenario and the ''Review of FP techniques'' section reviews the FP techniques. In the ''BFP design'' section, we introduce the BFP approach and, in the ''Analysis of FP performance'' section, the analysis of both BFP and FP techniques. Finally, in the ''BFP performance'' section, the numerical results are reported and discussed.

System model
The considered network model is the same as used in Mizmizi and Reggiani,11,12 that is, an asynchronous sensor network containing a number of target devices over a limited squared area on a single floor, with twodimensional coordinates (x, y).
In the area, there is a set of N B fixed nodes called beacons (BS) with known positions P BS i = fx i , y i g, fi = 1, 2, . . . , N B g. A rectangular grid is defined over the two-dimensional area, and any estimate of a target location is limited to the points on this grid. The grid of points has a resolution of D meters. Assuming that the grid spacing results in K x points along the x coordinate and K y along the y coordinate, we have a total number of positions in the area Any position can be represented by a triplet with label (x, y, z), where x and y represents the 2D coordinates on the floor plane while z represents the height of the antenna at that particular grid position. The coordinate z = 0 is assumed for all the points unless otherwise mentioned, and hence, the coordinates (x, y) i denote the location of the ith FP signature.
The experimental measures are taken in a room at Dipartimento di Elettronica, Informazione and Bioingegneria (DEIB) of Politecnico di Milano, whose map is sketched in Figure 1.

Signal model and measures
The most common model adopted for the real RSSI, recorded and stored during the off-line phase, responds simply to the received power under log-normal shadowing and receiver isotropic antenna gain 0 dB, that is where the Effective Isotropic Radiated Power EIRP incorporates the transmitter power and the antenna gain, L 0 is the average propagation loss at the reference distance d 0 (usually 1 m), a is the path loss exponent (PLE), d is the distance between transmitter and receiver, and L SH ;N (0, s 2 SH ) is the log-normal fluctuation due mainly to obstacles in the environment. 13 According to the relative position between the point on the grid and the beacon, the values of L 0 and a depend also on the line of sight (LoS) or non-LoS (NLoS) conditions. In fact, each RSSI measured by the target during the online phase is affected by a measure error W with variance s 2 W . This additional error is due to Channel random fluctuations (or multipath fading), widely studied in outdoor and indoor environments 14 System impairments, 15 such as differences in target device types, user orientation, environmental changes, mobile devices in different places, or heights (pockets, bags, hands, etc.) Therefore, the RSSI measures are modeled with an additional random log-normal component, uncorrelated to the channel shadowing component in equation where W ;N (0, s 2 W ). The experimental measures in the area in Figure 1 have confirmed the validity of the model in equation (3) and they have returned the distributions of the RSSI levels from each beacon in the covered space; an example is reported in Figure 2. In order to extend the possibility of analysis and simulation to other environments and to different sizes, we have also used the experimental data for deriving the parameters of the model in equation (2), separately for the signal coming from each beacon, by means of a linear regression. The numerical values, averaged among all the beacons for the sake of brevity, are reported in Table 1.

Review of FP techniques
Conventional localization algorithms using signal information like ToA, RSSI, or AoA face a serious performance degradation in indoor environments affected by phenomena like harsh multipath and NLoS. Taking advantage of LD features of the radio signal, there can exist a radio map containing LD parameters measured in predetermined points called grids so that target position can be estimated using pattern matching algorithms. In IPS based on radio frequency (RF) technologies such as WLAN or BLE, FP methods are among the most used, 3 thanks to their simplicity and reliability. There is a variety of measurements that can be used. The most common is the RSSI, but also signal to noise ratio (SNR), link quality information (LQI), channel impulse response, and others can be exploited.
FP is implemented in two basic steps: in the first step, which is called off-line or training phase, LD parameters of the received signal are measured in a gridbased map over the surveyed area; these are stored, and they form the so called radio-map. In the second phase, also called online phase, the target position is estimated by pattern matching between ongoing measurement of LD parameters and the stored radio-map.
The construction of the radio-map begins by dividing the area of interest into cells with the help of a floor plan. The RSSI values of the radio signals transmitted by beacons are collected by a test target inside the cells (or calibration points P k = fx, yg k ) for a certain period of time and stored into a database. The kth element (k = 1, . . . , K T ) in the radio-map has the form where R k is the fingerprint vector of measured RSSI from the beacons, and (x, y) k is the location of the kth fingerprint. The database term M k can contain further information, such as orientation or others indicators. The radio-map can be modified or pre-processed before applying it in the location estimation phase. The reason can be the reduction of the memory requirements and/ or of the computational cost of location estimation. In addition, different location estimation methods use different characteristics of the fingerprint histogram, such as the mean and the variance. During the online phase, the target collects a vector of measurements (here RSSI) from the beacons In order to estimate the position of the target user p MS , two main approaches are used: Deterministic: the position of the target user p MS is not considered as a random vector. 16 The main objective is to estimatep MS at each time  step. Usually, the estimate is a convex combination of the calibration points P k , that iŝ where w k are the weights applied to the kth calibration point, that can be inversely proportional to the RSSI norms, or where k : k is a norm, for example, the Euclidean. The estimation technique described in equation (6) is known as ''Weighted K-Nearest Neighbor'' (WKNN), and it is one of the most used by the FP algorithms. When all the calibration points use the same weight, it is called ''K-NN,'' and when K = 1, it is denoted simply as ''NN.'' In general, K-NN and WKNN can perform better than the NN method, particularly with parameter values K = 3 and K = 4. 16 However, if the density of the radio map is high, NN method can perform as well as more complicated methods.
Probabilistic: the position of the target user p MS is considered as a random vector. The idea behind the probabilistic approach is to compute the conditional (a posteriori) probability density function of the state from the measurements. The a posteriori pdf contains all the necessary information for computing an arbitrary estimate of the state and an estimate of the error. Two common estimators can be used: the maximum a posteriori (MAP) and the minimum mean square error (MMSE). The first computes the maximum of the a posteriori pdf and the second computes its mean.
FP-based methods produce accurate estimation of position in indoor environments; 17 they are easy to implement and the cost of the system is low since there is no need of further hardware, being RSSI measurements available in each radio technology. However, they have two main drawbacks: first, the off-line phase is laborious and time consuming, and changes in the environment can also compromise the overall system. Second, the vastness of the radio-map can make computationally heavy the on-line estimation, especially for IoT devices.

BFP design
In Mizmizi and Reggiani, 12 we have presented the BFP design, where the system is seen from a different point of view, related to a binary code interpretation of the FP scheme. The log 2 (K T ) bits that enumerate the K T grid positions and the corresponding FP signatures are now transformed, or encoded, in the N B Á log 2 (L) bits of each signature, where L is the number of levels used for each RSSI measure. The resulting code rate is defined as where, in the binary case (L = 2), we have In order to design the BFP scheme, the main steps are Given the area of interest A, which is an arbitrary polygon, we define the smallest square S with side length s that includes A. Define the cell size D, so that the number of cellŝ K obtained is a power of 2 Define a beacons placement, according to an iterative procedure, similar to the Gray code encoding process, 18  The numbers of beacons for the minimum D H = 1 and different number of cells are reported in Table 2.
Quantization of each RSSI measures in 2 levels according to the threshold RSSI REF , which, in general, depends on the beacons. Therefore, the RSSI coming from the bth beacon is quantized into The threshold RSSI REF, b can be obtained through measurements at the borders that delimit the regions with label 1 and the regions with label 0 for each beacon (in practice, the final threshold is obtained by averaging the values of the field measured on a set of points along the border). Of course, in the simulations, it can be computed using the model defined in equation (2).

Discard all the cells that do not belong to the effective area of interest A.
This design is referred to a static situation of the RSSI and of the beacons placement; in presence of a dynamic condition, caused by fluctuations or permanent changes of the channel propagation, the algorithm update is limited to the change of the N B quantization thresholds RSSI REF, b , one for each beacon. This is different from the type of update necessary in the conventional FP, where the K T (generally more than N B ) single fingerprints of the database are updated periodically. Furthermore, it is possible to update the RSSI REF, b values by means of adaptive algorithms, which change the thresholds according to the errors in the cells selection recognized during the online phase.
Another aspect to remark is that, due to the irregularity of the real RSSI field (e.g. in presence of shadowing) and to the non-exact coincidence between the ideal designed borders (e.g. the lines delimiting the square cells considered here) and the RSSI field lines corresponding to the thresholds RSSI REF, b , the real regions corresponding to the binary labels will not correspond exactly to the ideal cells. In our approach, this issue is faced by updating the cells to the new shapes after the computation of the thresholds RSSI REF, b . Also, it is possible that more binary labels than those used in the ideal design could appear because of the irregular lines of the RSSI values. In this case, two options are possible: (1) to keep the new number of modified cells as the new label assignment in the region or (2) merge the new cells in order to obtain the same number K T of cells in the original design.

Analysis of FP performance
The design of an FP-based IPS is a complex task since it is not limited to some general guidelines, but most of the time, it requires a tuning phase after that the system has been deployed, for example, it may be necessary to add or move some beacons, optimize the radio map, etc. The scope of the FP analysis is to help the designer, and to save time and costs. This is possible by predicting analytically the expected performance of the systems. In this section, we propose an analytical solution for both cases, L = 2 and L = ' methods; the proposed solutions are then validated through numerical simulations and experimental results as shown in the ''BFP performance'' section.
The main metrics we are considering in this analysis are the label selection error (LSE) and the mean square error (MSE). Therefore, given a FP-based localization system, with K T labels (or cells), the total MSE is given by where MSE k is related to the error contributions coming from devices in the kth label (or cell) where d 2 j, k is the squared distance between the points corresponding to the jth and the kth labels and P E (jjk) is the conditioned LSE of j given k, or the probability to estimate the cell j when you are located in the cell k. The analytical model is based on the following assumptions: The NN estimation method is considered for intercepting a performance upper bound; The performance is computed limiting the target positions to the grid points at the centers of the cells. However, the analysis can be extended to all the points in each cell, also considering the issues related to the RSSI irregularities commented at the end of ''BFP design'' section, whose impact is higher for the points closer to the cells borders.
The performance estimation can be applied also to experimental RSSI measures, including the impact of correlated shadowing. After the off-line FP phase, there are K T vectors of length N B with the form The elements R k (b) (b = 1, Á Á Á , N B ) of the vector R k are assumed as the true mean of the RSSI from each beacon. Usually, this is achieved by collecting a large number of samples of the RSSI for each orientation of These measures can be obtained directly by experimental measures or they can be modeled according to equation (3). For the analysis, each component in equation (16) is assumed to be a random variable with the following assumptions: The random variablesr i, k (in dBm) for all i are mutually independent; The random variablesr i, k (in dBm) are normally distributed; The (sample) standard deviation of all the random variablesr i, k is assumed to be identical and denoted by s W (in dBm).
For L = + ', the signal distance between the sample vector and the FP vectors is used to determine which of the points on the grid corresponds to the position of the target. The NN technique selects the (x, y) coordinates corresponding to the FP vector with the smallest signal distance from the sample vector as the estimated location.
For L = 2, the Hamming distance is evaluated between competing labels (after the RSSI quantization) and the NN technique selects the (x, y) coordinates corresponding to the BFP vector with the smallest Hamming distance from the received binary vector as the estimated location.

Case L = + '
The idea behind our approach comes from the signal theory and the corresponding geometrical interpretation of the FP technique. For the sake of simplicity, let us assume to have a system with two beacons: the radio map could be seen as a set of points scattered in the signal space as in Figure 3. Given the online FP, the estimated position is the FP with the minimum signal distance. From a geometric point of view, this means that the online FP belongs to the Voronoi region of the estimated FP. Therefore, the LSE probability of j given k in equation (14) can be computed as where f k (R k ) is the probability density function of the kth FP, and it is given by a multivariate normal distribution, thanks to equation (3), and V j is the Voronoi region related to the jth FP The integral in equation (17) can be very complex, also with numerical approximated methods, especially when the number of beacons grows. In Swangmuang and Krishnamurthy, 19 they propose to use the concept of proximity graph in order to compute a lower bound solution to the probability of selecting correctly a FP and therefore to optimize the radio-map. However, to give a more general analysis, we are interested in predicting the MSE or the LSE. Therefore, our solution approximates the Voronoi's regions with hypercubes in the multidimensional space, making their calculation possible in all the cases, since the multi-dimensional integral reduces to the product of mono-dimensional integrals. The proposed approach can be summarized as follows: Compute the minimum signal distance For each jth FP, define a hypercube with side equal to the minimum signal distance in equation (19) and centered at the jth FP; Compute the LSE probabilities in equation (17) integrating the multivariate normal in the regular hypercube; Compute the final MSE with equations (14) and (13).
In this case, which can be extended to the non-binary case with L.2, we exploit the error correcting code interpretation. The error probability is estimated by the following steps: For each FP vector R k in M, compute its binary version BFP, according to equation (12) The Hamming distance D H, j, k between the kth and the jth FP vector is computed as The probability P E (jjk) in equation (14) is computed by considering the N j, k error events E (i) j, k (i = 1, Á Á Á , N j, k ) that cause the mis-detection in the cell j given the considered cell k. If, for example, the Hamming distance D H, j, k = 1, N j, k = 1 and the error event , all zeros except for one. If we denote with E (i) 0 the set of bits in E (i) j, k equal to 0 (correct bits) and with E (i) 1 those equal to 6 1 (incorrect bits), the error probability will be approximated by where p(E (i) j, k (b)) is the probability of an incorrect beacon detection (i.e. the bit assignment based on equation (12) turns out to be incorrect) and it is affected by the assumptions made on RSSI, which is computed according to equation (3), that is When all the set of LSE probabilities P E (jjk) are computed, equations (14) and (13) are applied for obtaining the final MSE.
To extend the proposed method for considering not only the center of each cell but any random position, equation (22) has to be averaged over the entire cell changing the signal distance at the numerator; it is clear that the points closer to the cell borders will show the highest error contributions. In order to facilitate the integral, it is possible to use the distributions of the signal derived in Appendix 1, which can be considered approximately Gaussian when the cell size is small.

BFP performance
The main advantage from reducing the RSSI to a binary representation is the increment of the computational efficiency. BFP provides two advantages from the computational point of view: Decrease of the storage memory (and/or of the time necessary for the transmission of the data necessary for positioning from/to the target devices), thanks to the binary representation of the FP. Increase of the algorithmic efficiency, which can be evinced directly from the iterative code construction presented in ''BFP design'' section: the signature search can proceed in a logarithmic way, starting from the most significant bits (the first beacons, with the longest transmission range), halving at each bit the search area, and finishing with the least significant bits, corresponding to the fine division among single cells. Therefore, the logarithmic search is computationally more efficient since the best solutions can be found in log 2 (K T ) steps instead of K T . In addition, it has an interesting property with respect to the sequential search, from an application point of view: it provides a scalable localization precision, since each algorithm step provides a finer resolution of the area in which the target could be located. The localization process could stop at a number of bits lower than N B , according to the application or the context. The validation of the proposed technique is carried out through real experiments and numerical simulations. The case with L = ' is reported as a benchmark.

Experimental validation
The experimental measurements are collected in a class room of the Politecnico di Milano as described in Figure 1. The beacons used are BLE sensors from Silicon Labs, 20 while the target user is the BLE 112 development board. 20 The beacons transmit continuously BLE packets that contain the MAC address of the transmitter, the receiver board, store for each received packet, the ID of the sender, the RSSI estimated, and the time-stamp. For each of the 16 FP points, we have measured approximately 3000 samples from each beacon (5 min, with a rate of 10 measurements/s).
The results in Figure 4 show the cumulative distribution function of the error from real experiment and numerical simulation, while the analysis for both cases, under the same conditions, shows an average error of 4:6 m for L = 2 and 3:1 m for the benchmark, which is in line with the experimental findings.

Numerical validation
In order to extend the investigation of the impact of the main design or channel model parameters on the MSE, we are making the following assumptions for the remaining simulations and analysis, unless differently specified: The parameters of the channel model are taken from Table 1. These are derived by analyzing the experimental measurements; Elements of the sample vectors R = fr 1 , . . . ,r b , . . . ,r N B g are assumed independent, with a standard deviation s W (dB); A square grid K T = 4 3 4 is considered, with a resolution D equal to 3:15 m (as in Figure 1). The reference scenario is a square room with room side s = 12:6 m; The algorithm NN is used to estimate the target position; The binary labels derived for the scenario under consideration are reported in Figure 5; Ideal receivers sensitivity (À'); In the simulations, the target locations are extracted randomly among the cells centers and the number of runs is 10 4 for each point in the plot.
Numerical results in Figures 6 and 7 show how the MSE varies as a function of the standard deviation s W of the measurement error and of the pathloss exponent, respectively. Each plot compares analytical and simulated results with L = 2 and L = '. From this results, we can observe that When the propagation becomes more difficult, that is, a higher standard deviation of the measurement error s W , the MSE in both cases decreases. However, the gap between the two techniques is lower when s W is low or high. This happens since the LSE probabilities become   comparable and very low or high in these cases, respectively; The performance gap around 2 dB between L = +' and L = 2 can be compensated by a greater number of measures to be averaged in the binary case. In fact, averaging three independent RSSI measures decreases the effective s W of about 2 dB. This approach generates also the necessity of a trade-off between positioning quality and latency. The Figure 7 shows that, as the propagation becomes more difficult, or the pathloss increases, the MSE falls in both cases. This effect is due to the increase of the average signal distance between all the FPs, which decreases the probability of error.
Finally, Figure 8 shows an interesting property of BFP with respect to L = +': when the receivers sensitivity is not ideal but limited, the binary case appears more robust since, when the received signal from a beacon decreases below the minimum detectable level, the corresponding bit is automatically set to zero, which is coherent with the corresponding binary representation of the labels. On the contrary, the L =+' case suffers from a reduced precision of the missing RSSI measures, which can be appreciated when the room size increases and some of the farthest beacons signals decrease below the minimum detectable level.

Conclusion
In this article, we have developed the performance analysis in case of binary quantization in the RSSI signatures for FP localization. The study has exploited the design principles of BFP and the related similarities with the binary codes theory. In fact, using a single bit to represent the RSSI, it is possible to make the layout design starting from the concept of Hamming distance between the vectors of the radio map, which is directly related to the localization performance. The analysis and the simulations have revealed the performance compromise between BFP and the ideal case without quantization and the impact of the main channel parameters as well. The BFP looks computationally efficient, often with comparable performance with respect to conventional FP and suited to scenarios in which the computational and storage simplicity are the primary design factors, as for BLE devices, radio frequency identification (RFID) tags, or microsensors. Nevertheless, it is always possible to tune a performance compromise between the number of beacons and the number of measures for each position estimate with a consequent impact on the latency.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.