5 G WiFi Signal-Based Indoor Localization System Using Cluster k-Nearest Neighbor Algorithm

Indoor localization based on existent WiFi signal strength is becoming more and more prevalent and ubiquitous. Unfortunately, the WiFi received signal strength (RSS) is susceptible by multipath, signal attenuation, and environmental changes, which is the major challenge for accurate indoor localization. To overcome these limitations, we propose the cluster k-nearest neighbor (KNN) algorithm with 5 G WiFi signal to reduce the environmental interference and improve the localization performance without additional equipment. In this paper, we propose three approaches to improve the performance of localization algorithm. For one thing, we reduce the computation effort based on the coarse localization algorithm. For another, according to the detailed analysis of the 2.4 G and 5 G signal fluctuation, we expand the real-time measurement RSS before matching the fingerprint map. More importantly, we select the optimal nearest neighbor points based on the proposed cluster KNN algorithm. We have implemented the proposed algorithm and evaluated the performance with existent popular algorithms. Experimental results demonstrate that the proposed algorithm can effectively improve localization accuracy and exhibit superior performance in terms of localization stabilization and computation effort.


Introduction
Recently, indoor localization achieved through wireless network signal and mobile devices has received much attention. It can provide an accurate localization in indoor environment, such as shopping malls, airports, hospitals, subways, and university campuses. For outdoor localization environment, global navigation satellite systems (GNSS) such as GPS [1,2] have been used in a wide range of applications including tracking, transport navigation, and guidance. Although GPS works extremely well in outdoor localization, unfortunately, it does not perform well in indoors, urban canyons, construction and basement, and places close to the wall as the signal from the GPS satellites is too weak to come across most construction, thus making GPS hard for indoor localization. Attempting to find the accurate indoor localization, many indoor localization technologies are proposed, such as infrared [3], ultrawideband (UWB) [4], ultrasonic [5], Bluetooth [6], Radio Frequency Identification (RFID) [7], Zigbee [8], frequency modulation (FM) broadcast [9,10], geomagnetism [11], and Wireless Fidelity (WiFi) [12][13][14][15][16].
The mobile computing technology and the increasing availability of WiFi networks have enabled more accurate localization in indoor environments. Most WiFi-based indoor localization models for commercial uses are based on received signal strength. WiFi-based localization systems have several advantages. Firstly, in terms of cost effect, the WLAN deployment of localization algorithms does not need any additional hardware as network interface cards measure signal strength values from all wireless access points (APs) in range of the receiver. Therefore, RSS needed for localization can be obtained directly from network interface cards that are included in most mobile devices. Due to the ubiquity of WiFi networks, this mode of localization provides a particularly cost-effective solution for offering location based service (LBS) in commercial and residential indoor environments. For the low cost and wide availability without the need for 2 International Journal of Distributed Sensor Networks additional hardware, the WiFi indoor localization is becoming increasingly prevalent and ubiquitous.
There are different choices for the localization measurement methods, such as time of arrival (TOA) [17,18], time difference of arrival (TDOA) [19], angle of arrival (AOA) [20], channel state information (CSI) [21,22] and received signal strength (RSS) [23][24][25][26]. RSS is generally the feature of choice for indoor WiFi localization due to its low cost without the need for additional hardware at our mobile devices.
Most RSS-based indoor localization approaches adopt fingerprint algorithm as the basic scheme of indoor localization. Even though fingerprint algorithm has been successful in many localization systems [14,24,27,28], it exhibits several challenges when considering real indoor environment.
(i) Environmental variations and interferences: ideally, signal variations should only be affected by distance. The longer the distance to an access point (AP), the smaller the RSS. However, interferences and some undesired effects such as reflection can make the system detect a signal at a smaller RSS being closer to the AP. These may be caused by different set of conditions. The fingerprint is sensitive to environmental changes such as an object moving into the building, diffraction, and reflection, which result in changes in signal propagation.
(ii) 2.4 G as an unlicensed spectrum: this means that different hardware and applications apart from WiFi can freely use this spectrum to bring interference and noise. The most common electrical equipment which uses this frequency and can interfere with WiFi is microwaves that are the household electronic devices to emit interference in the 2.4 G band, Bluetooth devices, telecommunications devices, wireless security cameras, wireless speakers, and so on.
(iii) The time-consuming offline training phase to build the fingerprint map: during the fingerprint collection phase, there are enough empirical manual measurements to build fingerprint map.
In this work, in order to reduce the manual effort during the calibration phase, we have exploited a computer-aided s oftware, which allows us to automatically build the fingerprint map. This software consists of leading in the indoor layout and then exploiting fingerprint automatically at each reference point based on the application software. For solving the problem of the signal is unsteadiness, we propose an indoor localization system using the cluster KNN algorithm; it aims at both reducing the interference to improve localization accuracy based on 5 G signal and using cluster KNN algorithm to reduce the computation effort but improve the localization accuracy and stabilization.
The remainder of this paper is organized as follows: Section 2 discusses how related work available in the literature approaches such issues. A brief description of indoor localization system is presented in Section 3. Section 4 describes the coarse and precise localization algorithm in detail. The characteristics of 2.4 G and 5 G band signal are analysed in Section 5. The experiment results and evaluations through implementation are listed in Section 6. Finally, Section 7 concludes the paper.

Fingerprint Collection.
The major problem of fingerprint algorithm is the exhaustive survey needed to build the fingerprint map, a task that requires substantial cost and time. Another important issue of this fingerprint map is that a recalibration is needed every time the environment changes.
There are mainly two methods to build fingerprint map, empirical manual measurement [14,15] and computed analytically based on the signal propagation model [12,25,29]. Because signal propagation model is easily affected by environmental changes, normally, the fingerprint map is built with manual effort. In this phase, the fingerprint map is surveyed for all the reference points (RPs). Basically, fingerprint map is a database of reference points at predefined points (coordinates) coupled with various radio signal strength characteristics, for example, RSS, signal angles, or propagation time called signal fingerprint.
Step by step, for every fingerprint, there must be a measurement that includes the information about all positions and their received signal strength.
The popular researches highlight the strong needs of approaches aiming at reducing the time associated with the offline training phase of fingerprint algorithm. In [30], the need of an approach capable of reducing the heavy effort of the training phase is indicated as one of the key challenges in fingerprint. In [31], it is proposed that a valid training phase is hardly bearable since it requires collecting a large number of fingerprints. To reduce such fingerprint, it is presented to trade localization error against time, thus reducing the time needed to train the fingerprint map [31]. Homoplastically, it has been proposed in [32] that a huge amount of received signal strength is usually required for training and, typically, much time is necessary to collect such amount of training fingerprint. For this reason, it has been stated in [32] that a reduction of the manual effort can be achieved by minimizing the sampling time at each reference point (RP) and by limiting the number of positions to sample. Nevertheless, this simple approach makes inaccurate fingerprint map, which decreases the accuracy of the location estimation [32].
Attempting to develop training methods that try to reduce the training phase of fingerprint map have been proposed in [33,34]. Some works also propose training the fingerprint by using a mobile device such as a smartphone, employing WiFi scans transparently to the user [35]. In order to reduce the manual effort during the calibration phase, we have defined a computer-aided approach, which allows us to automatically build the fingerprint map. In this system, a software application is developed to build the fingerprint map. It is developed based on Android Java development kit 1.7.0 and Android software development kit 4.4. The general steps of the software application are as follows.
(i) Open the software application and import indoor workspace plan. (ii) Select points as reference points in the indoor map, collect real-time RSS, and store it in fingerprint map.
(iii) After collecting fingerprint information of all reference points, if the indoor environment changes, we should reconstruct fingerprint map, only selecting brush-fire reference points and renovating the fingerprint map.
(iv) Click the "localization" button and we should obtain our immediate position.

Matching Algorithm.
During the last years, several fingerprint localization algorithms have been proposed. The key idea of fingerprint algorithm is to find the optimal nearest neighbor points. In an attempt to find the best matching algorithm and try to improve the localization accuracy, many researchers propose the nearest neighbor (NN), -nearest neighbor (KNN) [36], weighted -nearest neighbor (WKNN) [37], Bayesian probabilistic model (BPM) [38], artificial neural network (ANN) [39], and support vector machine (SVM) [40,41] to obtain the optimal nearest neighbor points.
Although the above algorithms can achieve adequate localization performance, the computing and memory requirements have to be taken into consideration. While it is true that smart mobile devices are high capable machines, the users themselves do not want an application that takes gigabytes of data just to improve accuracy in localization. So a tradeoff between accuracy and complexity of algorithm is needed. In order to satisfy the acceptable localization accuracy, but with low computation effort, we propose an efficient indoor localization system. In our proposed system, the localization includes coarse and precise localization phases. The coarse phase is to reduce the computation, and then further localization uses cluster KNN algorithm. With the society developing and wireless network popularizing, the constructions are larger and more complex; it indicates that the fingerprint map is larger and larger. During the online localization phase, if we match the real-time RSS with the whole fingerprint map, the computation is too large for efficient localization. For reducing the computation effort and localization time, the coarse localization algorithm is proposed to reduce matching fingerprint data by the detected APs. Besides, by realistic experiment, we find that the same value RSS points are scattered sometimes. This phenomenon should cause large localization error. By comparison with current localization algorithms, the cluster KNN algorithm can remove the discrete nearest neighbor points and obtain optimal nearest neighbor points.

System Architecture
As in most fingerprint approaches, the proposed system consists of an offline training and an online localization phase. The offline training phase is responsible for collecting signal strength from each reference point (RP) and then recording it in the fingerprint database. At the online localization phase, the mobile devices collect the real-time measurement RSS and compare it with the available fingerprint in the database.

Offline Training Phase.
During the offline training phase, a designer of a fingerprint localization system has to deal with two distinct problems as follows.
(i) Fingerprint collection: this includes how to efficiently obtain fingerprint map in the large and complex indoor construction.
(ii) Search of the optimal APs placement: this includes how many APs are there and where they have to be placed to obtain an optimal localization accuracy.
There are three distinct problems that we should consider in the fingerprint collection: first, how to reduce the time of collection; second, how to reconstruct the fingerprint map efficiently when the indoor environment changes; third, how to extract the fingerprint characteristic of reference points. In our system, the fingerprint map is built by the developed software application, and the detailed steps are showed in Section 2.1. The software application not only builds fingerprint map conveniently but also renews partial fingerprint information of reference points without reconstructing the whole fingerprint map. During the fingerprint collection, we select RSS as fingerprint characteristic and choose media access control (MAC) as identity of different APs. Because the service set identifier (SSID) of AP may change artificially in realistic life, the MAC is sole for every AP. The raw set of RSS time samples are collected from AP at RP with orientation by software automatically. Since RSS varies noticeably due to interferences and environment conditions, several consecutive RSS measurements need to be collected at each reference point with a period of time. Then, the average of the RSS time samples is computed and stored in fingerprint database. Such fingerprint map can be represented by Ω ( ) , where ( ) , is a fingerprint from AP at RP with orientation , for = 1, 2, . . . , , = 1, 2, . . . , , and ∈ = { , }. is the total number of APs, and is the number of RPs.
The placement of the APs consists of identifying the optimal APs placement, which achieves a reasonable radio signal coverage of the workspace. In general, increasing the number of APs can improve the localization accuracy, but in terms of economic investment it increases deployment costs and the amount of time for the RSS collection from all APs while establishing localization infrastructure and fingerprint map. However, the goodness of a placement pattern highly depends on the specific workspace conditions, such as wall positions and materials, space topography, noise sources, and the stream of people in the workspace.
In [42], the experimental performance shows that if we place the APs in symmetric positions distributed over (1) the undetected AP ( ) , ← 100; the unknowed point is not adjacent to -st AP; (5) delete ; (6) else (7) the unknowed point is adjacent to -st AP; (8) store ; (9) end if (10) end while (11) count all ; (12) return coarse localization region Ω ( ) ; Algorithm 1: Coarse localization algorithm. the experimental workspace in such a way that the average signal power is high, it is likely to be one of the best choices to reduce localization error. In order to reduce the noise as far as possible, if the APs can be deployed by ourselves, it is better to be distributed evenly around the workspace as frontal way.

Online Localization Phase.
The online localization phase consists of coarse localization and precise localization. The goal of the coarse localization phase is to reduce the impossible regions from all the fingerprint map. Thus, it removes partial impossible fingerprint and reduces the computation effort of the precise localization phase. The greater the fingerprint map is, the bigger the reduction impossible regions are as is shown in Section 6.2. In the precise localization phase, the localization is computed by using the cluster KNN algorithm. A detailed introduction is given in the following.

Coarse Localization.
Due to the wide deployment of APs, the total number of detectable APs is generally much greater than that required for localization, which leads to redundant computations. Furthermore, unreliable APs with large RSS variances may also lead to biased estimation and reduce the stability of the localization accuracy. This motivates the use of APs selection techniques to select a subset of available APs for coarse localization. Since the mobile devices detect different number of APs in different region, the detected APs can be used to coarse localization.
The coarse localization is processed by comparing the detected APs to infer the rough localization region. The realtime measurement RSS can be represented as the columns of

Precise Localization.
The major challenge for accurate RSS-based localization comes from the variations of RSS due to the dynamic and unpredictable nature of radio channel, and the RSS is easily affected by environmental changes, such as shadowing and multipath. It is variable that the RSS is received from the same point at different time, so the real-time measurement RSS is not too credible for accurate localization. To decrease the effect with variational signal and analysis upon 2.4 G and 5 G WiFi signal, we get that absolute value of signal fluctuation is under 5 dBm as is shown in Figure 1, and about 93% fluctuation is within 5 dBm in all the fluctuations of measurement (2.4 G and 5 G band signal). In our experiment, the 5 G signal fluctuation is all within 5 dBm, and about 86% fluctuation is within 5 dBm in the 2.4 G signal. To avoid the inaccurate nearest neighbor points match with unauthentic real-time RSS, prior to conducting localization algorithm, the RSS is expanded by ±5 dBm, and then it used for localization.
In addition, the ecumenic signal propagation model is where ( ) parameter represents the path loss at a reference distance 0 , typically one meter. is the constant propagation value, is the distance between the transmitter and the receiver devices, and is a Gaussian random variable with mean 0 and standard deviation . From the signal propagation model, when the mobile devices are equidistant from the AP in line of sight, the mobile devices receive the same signal strength. This phenomenon should cause large error for localization. In Figure 2, the same color points represent the qualified nearest neighbor points; if the localization algorithm is processed based on all the qualified points, the localization error will be very large.
In our experiment, the range of signal strength is between −30 dBm and −99 dBm, and all the RSS of undetected APs represents −100 dBm. Many different RPs have the same value of RSS in the fingerprint map; if we use traditional nearest neighbor algorithms to find the optimal matching RPs, it will produce large error for localization. For the sake of searching the optimal matching RPs, not only the value of RSS but also the relationship of RPs is considered. In the offline training phase, the fingerprint map is collected in sequential order, and not only the subscript of RPs does represent the serial number but also the implies the relationship of RPs. Generally, the coterminous RPs have analogical value of RSS, and in the process of searching for the nearest neighbor points the real-time RSS should be matched with the coterminous RPs that have the analogical value of RSS. The detailed algorithm is represented as in Algorithm 2.

6
International Journal of Distributed Sensor Networks According to precise localization algorithm, we select three optimal nearest neighbor points , , and as localization; the ultimate localization estimation is represented as (̂,̂):̂= (3)

Accurate Localization Based on 5 G WiFi Signal
The indoor radio environment is quite complex. Because the 2.4 G is no permit limitation band, a wide variety of equipment use the band at the same time. It is unavoidable to be interfered by the same frequency equipment that has been introduced in Section 1. The interference signal will generate signal fluctuation when we detect the WiFi signal from surrounding access points. The entry level speed of 5 G WiFi is 433 Mbps, which is at least three times compared to that of the 2.4 G WiFi, and the high performance of 5 G WiFi can reach more than 1 Gbps. The high transmission rate can satisfy users' daily surfing needs and provide stable and high quality signal as well.
In this section, we explore the impact of frequency band (2.4 G and 5 G) to evaluate the localization accuracy. While 2.4 G signal is the only band originally used for WiFi, increasingly 5 G signal is also used despite of its poorer propagation characteristics resulting from higher frequency operation. As the 5 G frequency band is less crowded, there is far more spectrum available in 5 G band. From a WiFi fingerprint localization system perspective, in a typical environment today with APs using both 2.4 G and 5 G bands, a measurement RSS collected during either the offline training phase or the online localization phase will likely include a mix of 2.4 G and 5 G APs. Figure 3 shows that the signal strength of 5 G is stronger than that of 2.4 G. Generally, stronger signal is more stable, and stable signal can guarantee the high localization accuracy.
Besides this, from Figures 4 and 5, we can get that the 5 G signal is more stable than that of 2.4 G in the same environment. From 30-st RP to 35-st RP, the locus of 30st RP∼35-st RP in Figures 4 and 5 indicates the 5 G signal has poorer propagation characteristics resulting from higher frequency operation.
Today more and more APs have double frequency band, and the 5 G signal seems to be much more suitable for indoor localization than 2.4 G signal. Due to a lack of cochannel interference, it can be feasible to use more stable RSS for the purpose of accurate localization. So the 5 G signal is selected as localization estimation in our localization system.

Experiments Setup.
In this section, we present the implementation and experimental evaluations of the proposed system. The fingerprint database is collected by the smartphones; its network card can detect both 2.4 G and 5 G signals. The workspace is equipped with WiFi environment by five TP-LINK WDR6300 routers that can emit 2.4 G and 5 G WiFi signal. The total area is 36 × 25.8 meters, consisting of hallways and some classrooms. We collect 40 reference points evenly from the hallways region and obtain WiFi fingerprint database using smartphone with the application software that is our developed mobile software application to collect RSS and build the fingerprint map automatically. When collecting the fingerprint map, the smartphones are kept at the same height of approximately an adult's breast and sometimes rotated horizontally at the same position to face different directions. In particular, for the RPs data, 20 samples at each RP with a rate of 1 sample/second are collected by a user walking through the hallway area.
6.1.1. Performance Metric. Besides that, according to [9,43], Manhattan distance performs slightly better than Euclidean distance, and our workspace is a regular rectangle; we use the Manhattan distance as the standard of error analysis. The Manhattan distance is an expression of geometric metric space. It is defined as the sum of the absolute differences of values in a real-time measurement RSS from fingerprint as indicated by the following equation: where ManDist(⋅,⋅) is the Manhattan distance function; ( ) , is the real-time measurement RSS; ( ) , is one of fingerprint databases.

Algorithms Compared.
We run the following algorithms for comparison.
(i) -nearest neighbor (KNN): this is the most popular used algorithm, due to its excellent tradeoff between accuracy and computation complexity. It obtains the nearest neighbors in the online localization phase, in signal space, among the known fingerprint maps. " " is a parameter adapted to each localization system to obtain better performance.
(ii) Weighted -nearest neighbor (WKNN): the procedure is similar to the -nearest neighbor. The only difference is that the average of the coordinates is a weighted average.
(iii) Fuzzy logic: it is used to select which points are the most important to calculate the final coordinates of the current position and to assess their corresponding weight in the average. As for the other algorithms, the first step, after acquiring the current value of the received signal strength, is to determine the distance in the signal domain between the current position and all the points that make part of the fingerprint map. The next step is to transform these distance values into grades of membership;, that is, the fuzzification is made.  (iv) Bayesian histogram method: the probabilistic method is more complex is and based on the Bayes rule. It is classified as the kernel method and histogram method generally. In the kernel method, a probability mass is assigned to a kernel around the data observed; the probability is then computed using a kernel function. The histogram method (used in our experiments) uses bins or value categorization to cover all measurement range; according to these bins, we can then calculate the probability (existence of an AP in a certain position); thus each AP will appear with different probability and we can estimate the location according to the probabilities.

The Performance of Computation Effort.
In the coarse localization phase, the approximate localization region is inferred by the detected APs. According to the detected APs, remaining fingerprint database cannot include the detected APs, so we do not need to match the remaining fingerprint database, which can reduce the computation effort heavily in the precise localization phase. And the larger the fingerprint database is, the more efficient it is. From Figures 6 and 7, we can find that not all the APs can be detected in each point. Supposing that every unknown point can detect 5 APs and there are 40 RPs, as the number of RPs increases, the algorithm shows greater efficiency. And the computation effort is reduced dramatically while the fingerprint map increases as is shown in Figure 8.  3.7895, and 1.1500, respectively (In this study, in order to simplify the expression of error, the distance between two neighbor RPs is defined as one unit error, and all the error data omit unit.).

The Performance of Localization Stabilization.
In addition, we also analyse the variance to evaluate the localization stabilization. Besides localization accuracy, the localization stabilization is also important for localization system. In our experiment, the 2.4 G variances of KNN, WKNN, fuzzy logic, histogram, and cluster KNN algorithm are 11.6129, 11.5713, 11.6833, 7.6819, and 1.8560, respectively. The 5 G variances of the above algorithms are 9.3477, 9.4714, 9.3102, 7.5370, and 1.1500, respectively, as is showed in Figure 12. The smaller the variance is, the more stable the localization presents. It indicates that not only localization accuracy but also localization stabilization is improved with cluster KNN algorithm (In this study, in order to simplify the expression of variance, the distance between two neighbor RPs is defined as one unit error, and all the variance data omit unit.).

Conclusion
This paper focuses on improving the localization accuracy, stabilization, and reducing the computation effort by the proposed localization system that consists of coarse and precise  localization. In the coarse localization, we use the detected APs to infer the coarse localization region. It can eliminate the with our developed software application automatically. It improved efficiency of system operation enormously. In the precise localization, we propose two approaches to improve the localization accuracy and stabilization. By analysing the change rule of WiFi signal, about ±5 dBm of fluctuation is in the 2.4 G and 5 G signal, so the real-time measurement RSS is not so accurate as localization estimation. The realtime measurement RSS is firstly expanded by ±5 dBm and then used for the matching with the fingerprint map. In order to select the optimal nearest neighbor points, we use the relationship of RPs to eliminate the discrete nearest neighbor points. Experimental results indicate that the proposed  algorithm is the most accurate and stable of all comparative algorithms. In addition, our experiment also demonstrates that the 5 G WiFi signal is more stable for indoor localization.
In the future, we intend to perform this system to apply it in larger indoor construction, as well as to investigating the performance of the proposed algorithm in more complex environment and the multilayered buildings.