A study on the cabin indoor localization algorithm based on adaptive K-values

In ocean voyages, cabin indoor localization plays a considerably important role in ship safety management. Due to the special structure of ships and the influence of internal and external environment, the mainstream indoor positioning technology nowadays is not ideal for positioning inside the cabin. In order to improve the positioning accuracy, this article proposes an adaptive K-value-based cabin indoor positioning algorithm. The algorithm constructs a fingerprint library by collecting Received Signal Strength Indication data and carries out filtering; matches fingerprint information by adaptive K-values during the localization process, while using double K-nearest neighbor to reduce the influence of outlier points on the final localization; and finally uses the multi-point mass center method to determine the final localization results. The experimental results in the automated cabin laboratory show that the average positioning error of the algorithm is 1.74 m, which can well meet the cabin indoor positioning requirements. Graphical abstract


Introduction
Although the level of automation in modern ship management is increasing, regular inspections by the engine crew are still inevitable for the current ship safety management in China. The crew has a vital role in the safety management of the ship's engine; however, due to the special working environment on the ship, it is difficult for the chief engineer to effectively supervise the duty crew. Once the crew are not in place and they cannot find dangerous problems and handled in time, it may lead to extremely serious consequences. Using indoor localization technology to manage the duty personnel in the wheel cabin can effectively reduce the accidents caused by human factors. 1 Currently, common indoor localization technologies include Ultra-Wideband (UWB) technology, 2 Radio Frequency Identification (RFID), Bluetooth, 3 Wireless Local Area Networks (WIANs), Light Detection and Ranging (LiDAR), and so on. Some of these localization technologies have been commercially available. 4 Various localization techniques have different advantages and limitations, and the selection of different localization techniques can achieve more accurate positioning according to the complexity and diversity of indoor scenes. In this article, we found that the internal environment of the ship's cabin is different from the usual indoor environment on land, 5 and the cabin environment is more complex, with more interference factors, and the positioning equipment is easily eroded. Among the above localization technologies, Bluetooth positioning has become the first choice for indoor localization in ship environment because of its low cost, small volume, easy deployment, low-power consumption, and other advantages. 6 Indoor localization based on low-power Bluetooth (BLE) usually uses Received Signal Strength Indication (RSSI) data for real-time ranging between the local BLE node and the mobile terminal, 7 or fingerprintbased methods to obtain the location information of the mobile terminal in real-time. 8 Therefore, indoors it is usually affected by multi-path propagation and Non-Line-of-Sight (NLOS), which may affect the final positioning results. Compared with range-based positioning algorithms, RSSI-based fingerprint positioning algorithms have better localization accuracy. 9 However, traditional fingerprint algorithms mainly use K-nearest neighbor (KNN) or weighted K-nearest neighbor (WKNN), 10 which has good positioning effect in a typical indoor environment, but in a complex cabin environment, the RSSI signals receive increased interference and the localization effect is not ideal.
In order to improve the indoor positioning accuracy in the special environment of ships, this article proposes an indoor localization algorithm based on the adaptive K-values for ship cabins. The method reduces the influence caused by the instability of Bluetooth signal by filtering the collected Bluetooth RSSI signal. At the same time, this article improved the KNN method, and used the double-layer adaptive KNN method to obtain the final positioning result. The experimental results show that the adaptive K-values based KNN indoor positioning algorithm proposed in this article has a large improvement in the positioning accuracy in the ship environment compared with the traditional method.
The rest parts of this article are as follows. Section ''Related work'' introduces the relevant work of Bluetooth localization. Section ''KNN fingerprint localization algorithm based on adaptive K-values'' details the cabin indoor localization algorithm based on adaptive K-values. Section ''Experiments and results'' describes the experimental process and result analysis. Section ''Conclusion'' summaries this article and look into the future.

Related work
In recent years, with the rapid emergence of various large and complex buildings, there is an increasing demand for indoor positioning. For this reason, scholars at home and abroad have done a lot of research in the field of indoor positioning, and have also proposed a series of positioning techniques, 11,12 such as Wi-Fi, UWB, BLE, and ultrasonic. Each localization technology has its own advantages and drawbacks, and this section focuses on BLE-based indoor localization technology.
Unlike typical indoor environments, indoor localization in cabin environments mainly faces the following challenges: 1. Complex spatial structure: there are many machines and equipment inside the cabin. Due to the particularity of the ship, the internal environment is not open, the multi-path effect caused by object shading is relatively obvious, so the ordinary fitted fingerprint data are not suitable for application to the ship's cabin environment. 2. Equipment loss problem: as cargo ships sail on the sea surface for a long time, the natural environment is much harsh compared with land. Besides, the opening of the main engine will produce huge noise and vibration, which is very serious for the erosion of equipment. 3. Interference of magnetic field: the many electronic devices inside the engine room will affect the local magnetic field, thus reducing the positioning accuracy of the traditional method.
Therefore, considering the cost and environmental factors, Bluetooth is more suitable to serve as a signal source for indoor localization in the cabin environment rather than Wi-Fi, UWB, and so on. And many research works have been done by domestic and foreign scholars in Bluetooth indoor positioning.
Jin et al. 13 used coarse and fine granularity division to build a fingerprint library about the location mapping of RSSI sampling points, and proposed an adaptive Bluetooth fingerprint localization algorithm based on region preference. This algorithm can effectively remove points far from the localization point, but for the indoor environment with complex structure, the localization accuracy will be much reduced.
Peng et al. 14 establish a regional discrimination model for the RSSI signal, and build the RSSI vector group with the region. Then, they attain the weighted values of the RSSI signal by the Bayesian estimation. Finally, the discrimination model identifies the region of the received RSSI signal and conducted multi-point center of mass localization. The model adopts fitting data to test, and the actual localization effect is still unknown. In order to improve the detection performance of Bluetooth nodes, Mackey et al. 15 used three different Bayesian filters-Kalman filter (KF), particle filter (PF), and nonparametric information filter (NIF)-for optimization in order to improve the proximity detection performance of Bluetooth nodes. The experimental results show that the proximity detection performance is improved by 30% using Bayesian filters.
Yu et al. 16 proposed a three-dimensional (3D) indoor localization algorithm combining low-power Bluetooth with multi-sensor. It combines Inertial Navigation System (INS) with Pedestrian Dead Reckoning (PDR) to accuracy lead and estimate speed. The algorithm can achieve meter-level 2D positioning accuracy and sub-meter-level 3D height estimation accuracy in a typical indoor environment, and the combined positioning accuracy can reach 1.5 m in 75% of the cases. But they used more sensors, and some smartphones do not contain the required devices for localization, so it is difficult to achieve localization. Moreover, due to the hardware deviation, different equipment may receive the signal deviation, further calibration is required.
From the above-related work, it can be seen that the current Bluetooth-based indoor positioning mainly focuses on the construction of fingerprint library and the filtering processing of real-time signals. A number of sensing devices are often needed to carry out auxiliary localization. However, it is not possible to use multiple sensors for auxiliary positioning in the cabin environment because of the special structure of the ship. Because when installing the localization equipment, hulls may be punched holes, which damages to the hull structure and produce some safety risks. 17,18 However, the positioning accuracy would be greatly reduced in the absence of multi-sensor-assisted localization. Therefore, for the special environment of ship

KNN fingerprint localization algorithm based on adaptive K-values
The KNN is the most commonly used indoor localization method in fingerprint localization. This method is improved on the basis of KNN. The KNN algorithm usually first selects the fingerprint data of the K (K ø 2) groups which are the closest to the determined site. And then the final localization coordinate is calculated 19 through the center-of-mass algorithm. In order to improve the indoor localization accuracy in the cabin environment, this article proposes a dual KNN localization algorithm with adaptive K-values by improving the KNN algorithm. In the offline phase, the RSSI signals are collected in the cabin laboratory. The offline fingerprint library is constructed and the fingerprint data are stored in the database in the format of \ coordinates, RSSI 1 , RSSI 2 , . . . , RSSI n .. 20 Ten sets of data are collected for each fingerprint point. In order to reduce the fingerprint matching time, the 10 sets of data are optimized, and the calculation formula is like formula (1) where RSSI final is the optimized fingerprint information values, and MAX rssi and MIN rssi denote the maximum and minimum in the 10 sets of RSSI signals, respectively. The whole localization process is divided into two modules: the offline stage is the establishment of the fingerprint database, and the online localization module adopts the double KNN localization method with adaptive K-values for localization, as shown in Figure 1.
The main steps of the offline phase include the following: Step 1. Converting location areas to the Cartesian coordinate, select the coordinate origin, and construct the coordinate system.
Step 2. Collect the fingerprint data of the localization area by fingerprint acquisition equipment, and correspond the fingerprint data to the coordinates established in Step 1 to build the location fingerprint database of Bluetooth signals in the experimental environment. Each fingerprint point collects multiple sets of data, and the number can be selected according to the actual situation. In this article, 10 sets of data are collected for each fingerprint point to reduce the cost of fingerprint creation.
Step 3. Optimizes the data collected in Step 2, and only one set of fingerprint data is kept for each fingerprint point, in the optimized way as in formula (1).
The main steps of the online positioning module include the following: Step 1. Collect the fingerprint data from different points to be measured, and the obtained data are saved in the csv file format.
Step 2. Clean and denoise the data, so that it can reduce the impact of the noise on the localization results. In this phase, multiple sets of data are collected for localization, and each set of data contains the RSSI and number of nine Bluetooth beacons. In this article, several test points are set up, and each test point collects 500-1000 sets of data, among which there are about 10% of abnormal data. These abnormal data have a large impact on the localization, so the data need to be cleaned before localization. Since the RSSI signal basically conforms to the Gaussian distribution, this article uses the great likelihood method to estimate the mean value of each RSSI signal at the point to be located, and basically eliminates some abnormal data that are large or small.
Step 3. Sets the threshold values R 1 , and the K 1 value of the first layer KNN is dynamically obtained fingerprint data in the fingerprint library whose Chebyshev distance from the point to be located is less than the threshold R 1 Assuming that there are K 1 sets of data, the number of K 1 values obtained by different localization points is different, and it is necessary to avoid using the same Kvalues which would cause unstable localization results. Equation (3) is the Chebyshev distance calculation formula, where X and Y are two different vectors, and X i and Y i represent the elements corresponding to the two vectors. The initial coordinates of the point to be positioned are calculated by equation (4) after obtaining the x,ŷ,ẑ ð Þ= 1 K 1' Step 4. Sets the threshold values R 2 . This article takes the K 1 group coordinates obtained by Step 3 as the second fingerprint library, recorded as S. We take (x,ŷ,ẑ) obtained by Step 3 as the central coordinate, and find all the coordinate points in the S where the Euclidean distance from (x,ŷ,ẑ) is less than R 2 . The number is recorded as K 2 . The final localization result is attained by the multi-point center-of-mass method which calculates the average coordinate of K 2 coordinates.
In summary, the improved algorithm proposed in this article provides an accurate and specific solution to the localization problem in the cabin environment.
First, the collected fingerprint data are cleaned and the specific parameters of its Gaussian model are estimated by maximum likelihood method. Second, the adaptive K-values are adopted to reduce the localization bias of single K-values for different locations. Finally, the first localization results are screened by performing secondary KNN matching to eliminate remote points, so that we can eliminate remote points and further improve the localization accuracy.

Experiments and results
In this section, detailed experiments were conducted to evaluate the performance and localization accuracy of the adaptive K-values-based cabin indoor localization algorithm. The experiments were conducted in the standard cabin environment. In this article, the Bluetooth beacons adopt an nRF52810 chip supporting Bluetooth 4.0 with a broadcast power of 295;4 dB (set to 0 dB in this experiment) and a practical distance of within 50 m. The Fingerprint Collection Tool is a selfdeveloped Android-based APP.

Deployment of the experimental environment
The experimental site adopted in this article is Shanghai Maritime University Marine Engine Room Integrated Lab. In the experimental environment of 23 m 3 24 m 3 15 m, located in the middle is the mainframe, which has three levels, and to the left of the mainframe is a three-level corridor, and to the right is a double-level corridor, arranged with various instruments and equipment necessary for ship navigation. As shown in Figure 2, the internal environment is very complicated. To ensure that the Bluetooth signal can cover the whole area, nine Bluetooth beacons are arranged according to the distribution of instruments and numbered by 1;9 to facilitate the subsequent acquisition and identification of fingerprint data. The distribution of the numbered beacons is shown in Figure 3.

The collection and processing of the experimental data
In this experiment, the experimental environment of 23 m 3 24 m 3 15 m was divided into several 1 m 3 1 m 3 1 m areas. Each cubic meter represents one fingerprint point, and the data collected by each fingerprint includes coordinates, device number, and RSSI. The entire test area was divided into 8280 fingerprint points, but some of the fingerprint points were occupied by the machine or could not be collected, and a total of 2790 fingerprints were collected. Ten sets of data were collected for each fingerprint, totaling 27,900 sets of data, and some of the data are shown in Table 1.
After the establishment of the fingerprint library, the 10 sets of data for each point are processed by formula (1) to reduce the time of localization matching. The 1000 groups RSSI values to be measured are used as test data. We find that the Bluetooth RSSI values at the same position is similar to the Gaussian distribution, that is, the probability of the received RSSI values meets a Gaussian distribution model when the distance is constant. The data distribution for multiple tests is  shown in Figure 4, and the probability density function is shown in formula (5) Among them, m is the mathematical expectation and s is the standard deviation. In this article, the maximum likelihood estimation 21 is used to obtain the values of m and s in the model. The solution formulas are formulas (6) and (7), where N is the number of RSSIs conforming to the Gaussian distribution. Please refer to the literature 22 for the specific derivation process

Comparison and analysis of the experimental results
Result analysis of the KNN algorithm. In order to verify the performance of the improved algorithm in the cabin environment, comparative experiments are done in this article between the KNN algorithm and the KNN algorithm with adaptive K-values. In the experiments using the KNN algorithm, the Euclidean distance, the Chebyshev distance, and the Manhattan distance are used for several points to be localized, and the experimental results are shown in Figure 5. The results show that the selection of K-values has a close relationship with the localization accuracy. When the K-values is small, the localization accuracy is poor and the stability is poor, and when the K-values is above 20, the localization results gradually tend to be stable. Figure 5 shows that when different metric distances are selected for localization, the Chebyshev distance has better performance and higher localization accuracy in the cabin environment compared with the Euclidean distance and the Manhattan distance. Therefore, the Chebyshev distances were used in the subsequent experiments.
In the KNN algorithm test experiment, several points to be localized were selected and 1000 sets of RSSI fingerprint data were collected at each point. One of the experimental results is shown in Figure 6. In this experiment, K is equal to 15. The real coordinate is (20,9,6) and the estimated coordinate is (21.26,7.86,4.8), with a localization error of 2.08 m. However, the localization method of fixed K-values is not universal. When the same K-values are selected at different test points, Figure 5. Different metric distance localization changed with K-values. The three lines respectively represent the accuracy of fingerprint positioning using the Euclidean distance, the Manhattan distance, and the Chebyshev distance. As K changes, there are different curves.
the positioning results obtained fluctuate greatly, and the appearance of multiple outliers has a large impact on the final results.
Result analysis of KNN algorithm with adaptive K-values. In this article, the specific steps of the improved KNN algorithm are presented in section ''KNN fingerprint localization algorithm based on adaptive K-values,'' in which the dynamic selection of K-values is mainly achieved through the threshold R1 and R2. Therefore, the values of thresholds R1 and R2 are very important for the localization results. Figure 7 shows the results of the first localization of multiple sets of test data, with the horizontal coordinate indicating the threshold R1 and the vertical coordinate indicating the localization error. Before the outlier removal, it is necessary to ensure that the first localization result is relatively stable and the error is small. Second, the value of R1 should not be too small, otherwise the value of K1 is too small for secondary localization. From Figure 7, we can find that when R1 is equal to 5.6, the positioning results of multiple sets of test data are relatively stable and the error is within 1.5-2.3 m. Therefore, the parameter R1 is set to 5.6, and after the threshold value of R1 is obtained, the outliers need to be further eliminated to minimize the localization error. Figure 8 shows the results of the secondary localization with R1 equal to 5.6. Among all the test results, the overall localization error is the smallest when R2 is equal to 3.1, so R2 is set equal to 3.1 in this article.
In the improved KNN algorithm, the stability of localization is improved by adopting the adaptive Kvalues. And the impact of the outlier on the final result is reduced by dual KNN. The experimental results of the selected test point are shown in Figure 9. It is seen from Figure 9(a) that the first adaptive K-values are large and produce more outlier, which have a large impact on the localization results, and therefore, outlier should to be screened. This algorithm selects the first center point as the cluster center for secondary adaptive K-values localization, which basically excludes the interference of the outlier points. As shown in Figure  9(b), the more accurate localization results are obtained after secondary screening.
In this article, the trajectory positioning was performed in the automated ship cabin laboratory to test the positioning performance of the improved KNN algorithm. The track is shown in Figure 10 (blue is the actual trajectory and green is the positioned trajectory). The experimental results show that the positioned trajectory is basically consistent with the actual trajectory. Unlike typical indoor environment, the environment of the wheel cabin is more complex and more disturbed, while the improved KNN algorithm has a better localization performance in the cabin environment. In most cases, the localization error is within 1.41 m. Compared with the single KNN localization method, the improved KNN positioning algorithm significantly improves the indoor positioning accuracy and stability in the cabin environment.  Overall analysis. In this article, the KNN algorithm is improved and the improved algorithm is verified in a shipboard environment. The results are shown in Table 2.
To eliminate the problem of poor stability of the single K-values localization results, the K-values adaption is achieved by setting threshold parameters, which effectively improves the localization stability heart in the cabin environment. In addition, in order to eliminate the influence of the outlier on the localization results, dual localization is adopted to transform the multi-dimensional eigenvectors into 3D spatial vectors and further improve the positioning accuracy. In cases of 70%, the average localization accuracy of the improved KNN algorithm in the cabin environment is within 1.41 m, the overall average error is within 1.74 m, and the variance is 0.34.

Conclusion
In order to improve the indoor positioning accuracy in the ship cabin environment, this article proposes a cabin indoor localization algorithm based on the adaptive K-values and verifies the performance of the algorithm in a cabin environment. The algorithm is designed for the special cabin environment, and multiple sets of fingerprint data are dynamically acquired by adaptive K-values, and the KNN algorithm is used to obtain the initial localization; at the same time, outlier detection is introduced to perform secondary localization on the initial localization results. The experimental results show that the improved KNN algorithm can effectively improve the positioning accuracy and stability in the cabin environment, and the low-power  Bluetooth is used as the signal base station, which significantly reduces the system deployment cost in the cabin environment.
This article provides a feasible solution for indoor localization in a cabin environment, and fills the research gap of indoor positioning in cabin to a certain extent. But there are still some problems and defects. For example, the pre-construction fingerprint database takes a lot of time, and with the increase locate the scene, fingerprint matching time will be longer and longer. These are all the focus of future research. The future work in this article is to further improve the indoor localization accuracy in the cabin environment. We are considering whether it can simulate the fingerprint information in the cabin environment to establish a fingerprint attenuation model and automatically generate a fingerprint database, so that the cost of fingerprint construction will be reduced. In addition, a region of interest will be selected to reduce fingerprint matching time and increases the location accuracy.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the Innovation Program of Shanghai Municipal Education Commission (grant no. 2021-01-07-00-10-E00121) and in part by the Ministry of Transportation and Communications (grant no. Z20208141).